Python pandas sum columns

Pandas: Sum column values (entire or in a row) in Python

In today’s recipe we’ll touch on the basics of adding numeric values in a pandas DataFrame.

We’ll cover the following cases:

  • Sum all rows of one or multiple columns
  • Sum by column name/label into a new column
  • Adding values by index
  • Dealing with nan values
  • Sum values that meet a certain condition

Creating the dataset

We’ll start by creating a simple dataset

# Python3 # import pandas into your Python environment. import pandas as pd # Now, let's create the dataframe budget = pd.DataFrame() budget.head() 

How to sum a column? (or more)

For a single column we’ll simply use the Series Sum() method.

# one column budget['consumer_budg'].sum()

Also the DataFrame has a Sum() method, which we’ll use to add multiple columns:

#addingmultiple columns cols = ['consumer_budg', 'enterprise_budg'] budget[cols].sum()

We’ll receive a Series objects with the results:

consumer_budg 95000 enterprise_budg 90000 dtype: int64

Sum row values into a new column

More interesting is the case that we want to compute the values by adding multiple column values in a specific row. See this simple example below

# using the column label names budget['total_budget'] = budget['consumer_budg'] + budget['enterprise_budg']

We have created a new column as shown below:

Читайте также:  Очистить содержимое html элемента
person quarter consumer_budg enterprise_budg total_budget
0 John 1 15000 20000 35000
1 Kim 1 35000 30000 65000
2 Bob 1 45000 40000 85000

Note: We could have also used the loc method to subset by label.

Adding columns by index

We can also refer to the columns to sum by index, using the iloc method.

# by index budget['total_budget'] = budget.iloc[:,2]+ budget.iloc[:,3]

Result will be similar as above

Sum with conditions

In this example, we would like to define a column named high_budget and populate it only if the total_budget is over the 80K threshold.

budget['high_budget'] = budget.query('consumer_budg + enterprise_budg > 80000')['total_budget'] 

Adding columns with null values

Here we might need a bit of pre-processing to get rid of the null values using fillna().

Let’s quickly create a sample dataset containing null values (see last row).

# with nan import numpy as np budget_nan = pd.DataFrame()
person quarter consumer_budg enterprise_budg high_budget
0 John 1 15000 20000.0 35000.0
1 Kim 1 35000 30000.0 65000.0
2 Bob 1 45000 40000.0 85000.0
3 Court 1 50000 NaN NaN

Now lets use the DataFrame fillna() method to mass override the null values with Zeros so that we can sum the column values.

budget_nan.fillna(0, inplace=True) budget_nan['high_budget'] = budget_nan['consumer_budg'] + budget_nan['enterprise_budg'] budget_nan
person quarter consumer_budg enterprise_budg high_budget
0 John 1 15000 20000.0 35000.0
1 Kim 1 35000 30000.0 65000.0
2 Bob 1 45000 40000.0 85000.0
3 Court 1 50000 0.0 50000.0

Источник

pandas.DataFrame.sum#

Return the sum of the values over the requested axis.

This is equivalent to the method numpy.sum .

Parameters axis

Axis for the function to be applied on. For Series this parameter is unused and defaults to 0.

For DataFrames, specifying axis=None will apply the aggregation across both axes.

Exclude NA/null values when computing the result.

numeric_only bool, default False

Include only float, int, boolean columns. Not implemented for Series.

min_count int, default 0

The required number of valid values to perform the operation. If fewer than min_count non-NA values are present the result will be NA.

Additional keyword arguments to be passed to the function.

Return the index of the minimum.

Return the index of the maximum.

Return the sum over the requested axis.

Return the minimum over the requested axis.

Return the maximum over the requested axis.

Return the index of the minimum over the requested axis.

Return the index of the maximum over the requested axis.

>>> idx = pd.MultiIndex.from_arrays([ . ['warm', 'warm', 'cold', 'cold'], . ['dog', 'falcon', 'fish', 'spider']], . names=['blooded', 'animal']) >>> s = pd.Series([4, 2, 0, 8], name='legs', index=idx) >>> s blooded animal warm dog 4 falcon 2 cold fish 0 spider 8 Name: legs, dtype: int64 

By default, the sum of an empty or all-NA Series is 0 .

>>> pd.Series([], dtype="float64").sum() # min_count=0 is the default 0.0 

This can be controlled with the min_count parameter. For example, if you’d like the sum of an empty series to be NaN, pass min_count=1 .

>>> pd.Series([], dtype="float64").sum(min_count=1) nan 

Thanks to the skipna parameter, min_count handles all-NA and empty series identically.

>>> pd.Series([np.nan]).sum(min_count=1) nan 

Источник

Как рассчитать сумму столбцов в Pandas

Часто вас может заинтересовать вычисление суммы одного или нескольких столбцов в кадре данных pandas. К счастью, вы можете легко сделать это в pandas, используя функцию sum() .

В этом руководстве показано несколько примеров использования этой функции.

Пример 1: найти сумму одного столбца

Предположим, у нас есть следующие Pandas DataFrame:

import pandas as pd import numpy as np #create DataFrame df = pd.DataFrame() #view DataFrame df rating points assists rebounds 0 90 25 5 NaN 1 85 20 7 8 2 82 14 7 10 3 88 16 8 6 4 94 27 5 6 5 90 20 7 9 6 76 12 6 6 7 75 15 9 10 8 87 14 9 10 9 86 19 5 7 

Мы можем найти сумму столбца под названием «баллы», используя следующий синтаксис:

Функция sum() также будет исключать NA по умолчанию. Например, если мы найдем сумму столбца «рикошеты», первое значение «NaN» будет просто исключено из расчета:

Пример 2. Найдите сумму нескольких столбцов

Мы можем найти сумму нескольких столбцов, используя следующий синтаксис:

#find sum of points and rebounds columns df[['rebounds', 'points']]. sum () rebounds 72.0 points 182.0 dtype: float64 

Пример 3: найти сумму всех столбцов

Мы также можем найти сумму всех столбцов, используя следующий синтаксис:

#find sum of all columns in DataFrame df.sum () rating 853.0 points 182.0 assists 68.0 rebounds 72.0 dtype: float64 

Для столбцов, которые не являются числовыми, функция sum() просто не будет вычислять сумму этих столбцов.

Вы можете найти полную документацию по функции sum() здесь .

Источник

Pandas: Get sum of column values in a Dataframe

This dataframe contains information about students like their name, age, city and score.

Frequently Asked:

Now let’s see how to get the sum of values in the column ‘Score’ of this dataframe.

Get the sum of column values in a dataframe

Select the column by name and get the sum of all values in that column

Select a column from a dataframe by the column name and the get the sum of values in that column using the sum() function,

# Get total all values in column 'Score' of the DataFrame total = df['Score'].sum() print(total)

Here we selected the column ‘Score’ from the dataframe using [] operator and got all the values as Pandas Series object. Then we called the sum() function on that Series object to get the sum of values in it. So, it gave us the sum of values in the column ‘Score’ of the dataframe.

We can also select the column using loc[] and then we can get the sum of values in that column. For examples,

# Select column 'Score' using loc[] and calculate sum of all # values in that column total = df.loc[:, 'Score'].sum() print(total)

Here we selected the column ‘Score’ as Series object using loc[] and then we called the sum() function on the Series object to get the sum of all values in the column ‘Score’ of the dataframe.

Select the column by position and get the sum of all values in that column

Suppose we don’t have the column name but we know the position of a column in dataframe and we want the sum of values in that column. For that we will select the column by number or position in the dataframe using iloc[] and it will return us the column contents as a Series object. Then we will call the sum() function on that series,

# Get sum of all values in 4th column column_number = 4 total = df.iloc[:, column_number-1:column_number].sum() print(total)

It returned a Series with single value.

Here we selected the 4th column from the dataframe as a Series object using the iloc[] and the called the sum() function on the series object. So, it returned the sum of values in the 4th column i.e. column ‘Score’.

Get the sum of columns values for selected rows only in Dataframe

Select a column from Dataframe and get the sum of specific entries in that column. For example,

# Select 4th column of dataframe and get sum of first 3 values in that column total = df.iloc[0:3, 3:4].sum() print(total)

It returned a Series with single value.

Here we selected the first 3 rows of the 3rd column of the dataframe and then calculated its sum.

Get the sum of column values in a dataframe based on condition

Suppose in the above dataframe we want to get the sum of the score of students from Delhi only. For that we need to select only those values from the column ‘Score’ where ‘City’ is Delhi. Let’s see how to do that,

# Get sum of values in a column 'Score' # for those rows only where 'City' is 'Delhi' total = df.loc[df['City'] == 'Delhi', 'Score'].sum() print(total)

Using loc[] we selected the column ‘Score’ but for only those rows where column ‘City’ has value ‘Delhi’. Then we called the sum() function on the series object to get the sum of scores of students from ‘Delhi’. So, basically we selected rows from a dataframe that satisfy our condition and then selected the values of column ‘Score’ for those rows only. We did that in a single expression using loc[].

These were the different ways to get the sum of all or specific values in a dataframe column in Pandas.

Share your love

Leave a Comment Cancel Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Terms of Use

Disclaimer

Copyright © 2023 thisPointer

To provide the best experiences, we and our partners use technologies like cookies to store and/or access device information. Consenting to these technologies will allow us and our partners to process personal data such as browsing behavior or unique IDs on this site and show (non-) personalized ads. Not consenting or withdrawing consent, may adversely affect certain features and functions.

Click below to consent to the above or make granular choices. Your choices will be applied to this site only. You can change your settings at any time, including withdrawing your consent, by using the toggles on the Cookie Policy, or by clicking on the manage consent button at the bottom of the screen.

The technical storage or access is strictly necessary for the legitimate purpose of enabling the use of a specific service explicitly requested by the subscriber or user, or for the sole purpose of carrying out the transmission of a communication over an electronic communications network.

The technical storage or access is necessary for the legitimate purpose of storing preferences that are not requested by the subscriber or user.

The technical storage or access that is used exclusively for statistical purposes. The technical storage or access that is used exclusively for anonymous statistical purposes. Without a subpoena, voluntary compliance on the part of your Internet Service Provider, or additional records from a third party, information stored or retrieved for this purpose alone cannot usually be used to identify you.

The technical storage or access is required to create user profiles to send advertising, or to track the user on a website or across several websites for similar marketing purposes.

Источник

Оцените статью