What is the Cumulative Sum of a pandas DataFrame?
In many DataFrame operations, the result is based on the value of the current row/observation. Yet, there are many applications where the cumulative value of all rows is essential for analysis.
Below is a dataset of daily confirmed cases of COVID-19:
import pandas as pd
df = pd.DataFrame([
{'Date': '2022-01-01', 'Confirmed': 7 },
{'Date': '2022-01-02', 'Confirmed': 4 },
{'Date': '2022-01-03', 'Confirmed': 12 },
{'Date': '2022-01-04', 'Confirmed': 9 },
{'Date': '2022-01-05', 'Confirmed': 10 },
])
Finding the Cumulative Sum
The cumulative sum is calculated by pandas using the cumsum
function on the column of a DataFrame. For example, if we want to find the Total Confirmed cases so far this year:
Reset Code Python Output:
Date | Confirmed | Total Confirmed | |
---|---|---|---|
0 | 2022-01-01 | 7 | 7 |
1 | 2022-01-02 | 4 | 11 |
2 | 2022-01-03 | 12 | 23 |
3 | 2022-01-04 | 9 | 32 |
4 | 2022-01-05 | 10 | 42 |
Notice that the new column, Total Confirmed
, contains the sum of all the confirmed cases up to, and including, the current row!
Pandas Documentation
Click Here for the full pandas documentation for the cumsum
function