What is the Cumulative Sum of a pandas DataFrame?


In many DataFrame operations, the result is based on the value of the current row/observation. Yet, there are many applications where the cumulative value of all rows is essential for analysis.

Below is a dataset of daily confirmed cases of COVID-19:

import pandas as pd
df = pd.DataFrame([
    {'Date': '2022-01-01', 'Confirmed': 7 },
    {'Date': '2022-01-02', 'Confirmed': 4 },
    {'Date': '2022-01-03', 'Confirmed': 12 },
    {'Date': '2022-01-04', 'Confirmed': 9 },
    {'Date': '2022-01-05', 'Confirmed': 10 },
])
Our example DataFrames

Finding the Cumulative Sum

The cumulative sum is calculated by pandas using the cumsum function on the column of a DataFrame. For example, if we want to find the Total Confirmed cases so far this year:

Reset Code Python Output:
Date Confirmed Total Confirmed
0 2022-01-01 7 7
1 2022-01-02 4 11
2 2022-01-03 12 23
3 2022-01-04 9 32
4 2022-01-05 10 42

Notice that the new column, Total Confirmed, contains the sum of all the confirmed cases up to, and including, the current row!

Pandas Documentation

Click Here for the full pandas documentation for the cumsum function