- Pandas: Sum column values (entire or in a row) in Python
- Creating the dataset
- How to sum a column? (or more)
- Sum row values into a new column
- Adding columns by index
- Sum with conditions
- Adding columns with null values
- Как рассчитать сумму столбцов в Pandas
- Пример 1: найти сумму одного столбца
- Пример 2. Найдите сумму нескольких столбцов
- Пример 3: найти сумму всех столбцов
- pandas.DataFrame.sum#
- Pandas: Get sum of column values in a Dataframe
- Frequently Asked:
- Get the sum of column values in a dataframe
- Select the column by position and get the sum of all values in that column
- Get the sum of columns values for selected rows only in Dataframe
- Get the sum of column values in a dataframe based on condition
- Related posts:
- Share your love
- Leave a Comment Cancel Reply
- Terms of Use
- Disclaimer
Pandas: Sum column values (entire or in a row) in Python
In today’s recipe we’ll touch on the basics of adding numeric values in a pandas DataFrame.
We’ll cover the following cases:
- Sum all rows of one or multiple columns
- Sum by column name/label into a new column
- Adding values by index
- Dealing with nan values
- Sum values that meet a certain condition
Creating the dataset
We’ll start by creating a simple dataset
# Python3 # import pandas into your Python environment. import pandas as pd # Now, let's create the dataframe budget = pd.DataFrame() budget.head()
How to sum a column? (or more)
For a single column we’ll simply use the Series Sum() method.
# one column budget['consumer_budg'].sum()
Also the DataFrame has a Sum() method, which we’ll use to add multiple columns:
#addingmultiple columns cols = ['consumer_budg', 'enterprise_budg'] budget[cols].sum()
We’ll receive a Series objects with the results:
consumer_budg 95000 enterprise_budg 90000 dtype: int64
Sum row values into a new column
More interesting is the case that we want to compute the values by adding multiple column values in a specific row. See this simple example below
# using the column label names budget['total_budget'] = budget['consumer_budg'] + budget['enterprise_budg']
We have created a new column as shown below:
person | quarter | consumer_budg | enterprise_budg | total_budget | |
---|---|---|---|---|---|
0 | John | 1 | 15000 | 20000 | 35000 |
1 | Kim | 1 | 35000 | 30000 | 65000 |
2 | Bob | 1 | 45000 | 40000 | 85000 |
Note: We could have also used the loc method to subset by label.
Adding columns by index
We can also refer to the columns to sum by index, using the iloc method.
# by index budget['total_budget'] = budget.iloc[:,2]+ budget.iloc[:,3]
Result will be similar as above
Sum with conditions
In this example, we would like to define a column named high_budget and populate it only if the total_budget is over the 80K threshold.
budget['high_budget'] = budget.query('consumer_budg + enterprise_budg > 80000')['total_budget']
Adding columns with null values
Here we might need a bit of pre-processing to get rid of the null values using fillna().
Let’s quickly create a sample dataset containing null values (see last row).
# with nan import numpy as np budget_nan = pd.DataFrame()
person | quarter | consumer_budg | enterprise_budg | high_budget | |
---|---|---|---|---|---|
0 | John | 1 | 15000 | 20000.0 | 35000.0 |
1 | Kim | 1 | 35000 | 30000.0 | 65000.0 |
2 | Bob | 1 | 45000 | 40000.0 | 85000.0 |
3 | Court | 1 | 50000 | NaN | NaN |
Now lets use the DataFrame fillna() method to mass override the null values with Zeros so that we can sum the column values.
budget_nan.fillna(0, inplace=True) budget_nan['high_budget'] = budget_nan['consumer_budg'] + budget_nan['enterprise_budg'] budget_nan
person | quarter | consumer_budg | enterprise_budg | high_budget | |
---|---|---|---|---|---|
0 | John | 1 | 15000 | 20000.0 | 35000.0 |
1 | Kim | 1 | 35000 | 30000.0 | 65000.0 |
2 | Bob | 1 | 45000 | 40000.0 | 85000.0 |
3 | Court | 1 | 50000 | 0.0 | 50000.0 |
Как рассчитать сумму столбцов в Pandas
Часто вас может заинтересовать вычисление суммы одного или нескольких столбцов в кадре данных pandas. К счастью, вы можете легко сделать это в pandas, используя функцию sum() .
В этом руководстве показано несколько примеров использования этой функции.
Пример 1: найти сумму одного столбца
Предположим, у нас есть следующие Pandas DataFrame:
import pandas as pd import numpy as np #create DataFrame df = pd.DataFrame() #view DataFrame df rating points assists rebounds 0 90 25 5 NaN 1 85 20 7 8 2 82 14 7 10 3 88 16 8 6 4 94 27 5 6 5 90 20 7 9 6 76 12 6 6 7 75 15 9 10 8 87 14 9 10 9 86 19 5 7
Мы можем найти сумму столбца под названием «баллы», используя следующий синтаксис:
Функция sum() также будет исключать NA по умолчанию. Например, если мы найдем сумму столбца «рикошеты», первое значение «NaN» будет просто исключено из расчета:
Пример 2. Найдите сумму нескольких столбцов
Мы можем найти сумму нескольких столбцов, используя следующий синтаксис:
#find sum of points and rebounds columns df[['rebounds', 'points']]. sum () rebounds 72.0 points 182.0 dtype: float64
Пример 3: найти сумму всех столбцов
Мы также можем найти сумму всех столбцов, используя следующий синтаксис:
#find sum of all columns in DataFrame df.sum () rating 853.0 points 182.0 assists 68.0 rebounds 72.0 dtype: float64
Для столбцов, которые не являются числовыми, функция sum() просто не будет вычислять сумму этих столбцов.
Вы можете найти полную документацию по функции sum() здесь .
pandas.DataFrame.sum#
Return the sum of the values over the requested axis.
This is equivalent to the method numpy.sum .
Parameters axis
Axis for the function to be applied on. For Series this parameter is unused and defaults to 0.
For DataFrames, specifying axis=None will apply the aggregation across both axes.
Exclude NA/null values when computing the result.
numeric_only bool, default False
Include only float, int, boolean columns. Not implemented for Series.
min_count int, default 0
The required number of valid values to perform the operation. If fewer than min_count non-NA values are present the result will be NA.
Additional keyword arguments to be passed to the function.
Return the index of the minimum.
Return the index of the maximum.
Return the sum over the requested axis.
Return the minimum over the requested axis.
Return the maximum over the requested axis.
Return the index of the minimum over the requested axis.
Return the index of the maximum over the requested axis.
>>> idx = pd.MultiIndex.from_arrays([ . ['warm', 'warm', 'cold', 'cold'], . ['dog', 'falcon', 'fish', 'spider']], . names=['blooded', 'animal']) >>> s = pd.Series([4, 2, 0, 8], name='legs', index=idx) >>> s blooded animal warm dog 4 falcon 2 cold fish 0 spider 8 Name: legs, dtype: int64
By default, the sum of an empty or all-NA Series is 0 .
>>> pd.Series([], dtype="float64").sum() # min_count=0 is the default 0.0
This can be controlled with the min_count parameter. For example, if you’d like the sum of an empty series to be NaN, pass min_count=1 .
>>> pd.Series([], dtype="float64").sum(min_count=1) nan
Thanks to the skipna parameter, min_count handles all-NA and empty series identically.
>>> pd.Series([np.nan]).sum(min_count=1) nan
Pandas: Get sum of column values in a Dataframe
This dataframe contains information about students like their name, age, city and score.
Frequently Asked:
Now let’s see how to get the sum of values in the column ‘Score’ of this dataframe.
Get the sum of column values in a dataframe
Select the column by name and get the sum of all values in that column
Select a column from a dataframe by the column name and the get the sum of values in that column using the sum() function,
# Get total all values in column 'Score' of the DataFrame total = df['Score'].sum() print(total)
Here we selected the column ‘Score’ from the dataframe using [] operator and got all the values as Pandas Series object. Then we called the sum() function on that Series object to get the sum of values in it. So, it gave us the sum of values in the column ‘Score’ of the dataframe.
We can also select the column using loc[] and then we can get the sum of values in that column. For examples,
# Select column 'Score' using loc[] and calculate sum of all # values in that column total = df.loc[:, 'Score'].sum() print(total)
Here we selected the column ‘Score’ as Series object using loc[] and then we called the sum() function on the Series object to get the sum of all values in the column ‘Score’ of the dataframe.
Select the column by position and get the sum of all values in that column
Suppose we don’t have the column name but we know the position of a column in dataframe and we want the sum of values in that column. For that we will select the column by number or position in the dataframe using iloc[] and it will return us the column contents as a Series object. Then we will call the sum() function on that series,
# Get sum of all values in 4th column column_number = 4 total = df.iloc[:, column_number-1:column_number].sum() print(total)
It returned a Series with single value.
Here we selected the 4th column from the dataframe as a Series object using the iloc[] and the called the sum() function on the series object. So, it returned the sum of values in the 4th column i.e. column ‘Score’.
Get the sum of columns values for selected rows only in Dataframe
Select a column from Dataframe and get the sum of specific entries in that column. For example,
# Select 4th column of dataframe and get sum of first 3 values in that column total = df.iloc[0:3, 3:4].sum() print(total)
It returned a Series with single value.
Here we selected the first 3 rows of the 3rd column of the dataframe and then calculated its sum.
Get the sum of column values in a dataframe based on condition
Suppose in the above dataframe we want to get the sum of the score of students from Delhi only. For that we need to select only those values from the column ‘Score’ where ‘City’ is Delhi. Let’s see how to do that,
# Get sum of values in a column 'Score' # for those rows only where 'City' is 'Delhi' total = df.loc[df['City'] == 'Delhi', 'Score'].sum() print(total)
Using loc[] we selected the column ‘Score’ but for only those rows where column ‘City’ has value ‘Delhi’. Then we called the sum() function on the series object to get the sum of scores of students from ‘Delhi’. So, basically we selected rows from a dataframe that satisfy our condition and then selected the values of column ‘Score’ for those rows only. We did that in a single expression using loc[].
These were the different ways to get the sum of all or specific values in a dataframe column in Pandas.
Related posts:
Share your love
Leave a Comment Cancel Reply
This site uses Akismet to reduce spam. Learn how your comment data is processed.
Terms of Use
Disclaimer
Copyright © 2023 thisPointer
To provide the best experiences, we and our partners use technologies like cookies to store and/or access device information. Consenting to these technologies will allow us and our partners to process personal data such as browsing behavior or unique IDs on this site and show (non-) personalized ads. Not consenting or withdrawing consent, may adversely affect certain features and functions.
Click below to consent to the above or make granular choices. Your choices will be applied to this site only. You can change your settings at any time, including withdrawing your consent, by using the toggles on the Cookie Policy, or by clicking on the manage consent button at the bottom of the screen.
The technical storage or access is strictly necessary for the legitimate purpose of enabling the use of a specific service explicitly requested by the subscriber or user, or for the sole purpose of carrying out the transmission of a communication over an electronic communications network.
The technical storage or access is necessary for the legitimate purpose of storing preferences that are not requested by the subscriber or user.
The technical storage or access that is used exclusively for statistical purposes. The technical storage or access that is used exclusively for anonymous statistical purposes. Without a subpoena, voluntary compliance on the part of your Internet Service Provider, or additional records from a third party, information stored or retrieved for this purpose alone cannot usually be used to identify you.
The technical storage or access is required to create user profiles to send advertising, or to track the user on a website or across several websites for similar marketing purposes.