What is the Cumulative Sum of a pandas DataFrame?


In many DataFrame operations, the result is based on the value of the current row/observation. Yet, there are many applications where the cumulative value of all rows is essential for analysis.

Below is a dataset of daily confirmed cases of COVID-19:

import pandas as pd
df = pd.DataFrame([
    {'Date': '2022-01-01', 'Confirmed': 7 },
    {'Date': '2022-01-02', 'Confirmed': 4 },
    {'Date': '2022-01-03', 'Confirmed': 12 },
    {'Date': '2022-01-04', 'Confirmed': 9 },
    {'Date': '2022-01-05', 'Confirmed': 10 },
])
Our example DataFrames

Finding the Cumulative Sum

The cumulative sum is calculated by pandas using the cumsum function on the column of a DataFrame. For example, if we want to find the Total Confirmed cases so far this year:

df["Total Confirmed"]  = df["Confirmed"].cumsum()
DateConfirmedTotal Confirmed
02022-01-0177
12022-01-02411
22022-01-031223
32022-01-04932
42022-01-051042
Using the cumsum function to create a cumulative sum.

Notice that the new column, Total Confirmed, contains the sum of all the confirmed cases up to, and including, the current row!

Pandas Documentation

Click Here for the full pandas documentation for the cumsum function