Removing Columns in a DataFrame
Let's say you have a DataFrame with a bunch of columns, but some of them are unnecessary for your analysis. To make the data less cluttered, you can remove a column from your DataFrame using pandas.
The Movie Dataset
To demonstrate this, we'll use a DataFrame of five different movies, including information about their release date, how much money they made in US dollars, and a personal rating out of 10.
import pandas as pd
#Creates a DataFrame of "movie", "release date", "domestic gross", "worldwide gross", "personal rating", and "international box office" columns
df = pd.DataFrame([
{"movie": "The Truman Show", "release date": "1996-06-05", "domestic box office": 125618201, "worldwide box office": 264118201, "personal rating": 10, "international box office": 138500000},
{"movie": "Rogue One: A Star Wars Story", "release date": "2016-12-16", "domestic box office": 532177324, "worldwide box office": 1055135598, "personal rating": 9, "international box office": 522958274},
{"movie": "Iron Man", "release date": "2008-05-02", "domestic box office": 318604126, "worldwide box office": 585171547, "personal rating": 7, "international box office": 266567421},
{"movie": "Blade Runner", "release date": "1982-06-25", "domestic box office": 32656328, "worldwide box office": 39535837, "personal rating": 8, "international box office": 6879509},
{"movie": "Breakfast at Tiffany's", "release date": "1961-10-05", "domestic box office": 9551904, "worldwide box office": 9794721, "personal rating": 7, "international box office": 242817}
])
df
movie | release date | domestic box office | worldwide box office | personal rating | international box office | |
---|---|---|---|---|---|---|
0 | The Truman Show | 1996-06-05 | 125618201 | 264118201 | 10 | 138500000 |
1 | Rogue One: A Star Wars Story | 2016-12-16 | 532177324 | 1055135598 | 9 | 522958274 |
2 | Iron Man | 2008-05-02 | 318604126 | 585171547 | 7 | 266567421 |
3 | Blade Runner | 1982-06-25 | 32656328 | 39535837 | 8 | 6879509 |
4 | Breakfast at Tiffany's | 1961-10-05 | 9551904 | 9794721 | 7 | 242817 |
Removing One Column from a DataFrame
We can remove columns with one simple function: df.drop()
. All you need to do is type the name of the column you want to get rid of in the parenthesis. Remember to put quotes around the column name and store the results in a new variable to save your changes.
df_dropped = df.drop(columns = "personal rating")
df_dropped
movie | release date | domestic box office | worldwide box office | international box office | |
---|---|---|---|---|---|
0 | The Truman Show | 1996-06-05 | 125618201 | 264118201 | 138500000 |
1 | Rogue One: A Star Wars Story | 2016-12-16 | 532177324 | 1055135598 | 522958274 |
2 | Iron Man | 2008-05-02 | 318604126 | 585171547 | 266567421 |
3 | Blade Runner | 1982-06-25 | 32656328 | 39535837 | 6879509 |
4 | Breakfast at Tiffany's | 1961-10-05 | 9551904 | 9794721 | 242817 |
Removing Multiple Columns from a DataFrame
When dropping multiple columns, make sure to put the columns in a list with brackets and separate them with commas.
df_dropped_multi = df.drop(columns = ["personal rating", "release date"])
df_dropped_multi
movie | domestic box office | worldwide box office | international box office | |
---|---|---|---|---|
0 | The Truman Show | 125618201 | 264118201 | 138500000 |
1 | Rogue One: A Star Wars Story | 532177324 | 1055135598 | 522958274 |
2 | Iron Man | 318604126 | 585171547 | 266567421 |
3 | Blade Runner | 32656328 | 39535837 | 6879509 |
4 | Breakfast at Tiffany's | 9551904 | 9794721 | 242817 |