What is the Cumulative Sum of a pandas DataFrame?
In many DataFrame operations, the result is based on the value of the current row/observation. Yet, there are many applications where the cumulative value of all rows is essential for analysis.
Below is a dataset of daily confirmed cases of COVID-19:
Finding the Cumulative Sum
The cumulative sum is calculated by pandas using the cumsum
function on the column of a DataFrame. For example, if we want to find the Total Confirmed cases so far this year:
df["Total Confirmed"] = df["Confirmed"].cumsum()
Date | Confirmed | Total Confirmed | |
---|---|---|---|
0 | 2022-01-01 | 7 | 7 |
1 | 2022-01-02 | 4 | 11 |
2 | 2022-01-03 | 12 | 23 |
3 | 2022-01-04 | 9 | 32 |
4 | 2022-01-05 | 10 | 42 |
cumsum
function to create a cumulative sum.Notice that the new column, Total Confirmed
, contains the sum of all the confirmed cases up to, and including, the current row!
Pandas Documentation
Click Here for the full pandas documentation for the cumsum
function