Removing Columns in a DataFrame


Let's say you have a DataFrame with a bunch of columns, but some of them are unnecessary for your analysis. To make the data less cluttered, you can remove a column from your DataFrame using pandas.

The Movie Dataset

To demonstrate this, we'll use a DataFrame of five different movies, including information about their release date, how much money they made in US dollars, and a personal rating out of 10.

import pandas as pd

#Creates a DataFrame of "movie", "release date", "domestic gross", "worldwide gross", "personal rating", and "international box office" columns
df = pd.DataFrame([
  {"movie": "The Truman Show", "release date": "1996-06-05", "domestic box office": 125618201, "worldwide box office": 264118201, "personal rating": 10, "international box office": 138500000}, 
  {"movie": "Rogue One: A Star Wars Story", "release date": "2016-12-16", "domestic box office": 532177324, "worldwide box office": 1055135598, "personal rating": 9, "international box office": 522958274}, 
  {"movie": "Iron Man", "release date": "2008-05-02", "domestic box office": 318604126, "worldwide box office": 585171547, "personal rating": 7, "international box office": 266567421}, 
  {"movie": "Blade Runner", "release date": "1982-06-25", "domestic box office": 32656328, "worldwide box office": 39535837, "personal rating": 8, "international box office": 6879509}, 
  {"movie": "Breakfast at Tiffany's", "release date": "1961-10-05", "domestic box office": 9551904, "worldwide box office": 9794721, "personal rating": 7, "international box office": 242817}
  ])
df
movierelease datedomestic box officeworldwide box officepersonal ratinginternational box office
0The Truman Show1996-06-0512561820126411820110138500000
1Rogue One: A Star Wars Story2016-12-1653217732410551355989522958274
2Iron Man2008-05-023186041265851715477266567421
3Blade Runner1982-06-25326563283953583786879509
4Breakfast at Tiffany's1961-10-05955190497947217242817
Creating the movie DataFrame

Removing One Column from a DataFrame

We can remove columns with one simple function: df.drop(). All you need to do is type the name of the column you want to get rid of in the parenthesis. Remember to put quotes around the column name and store the results in a new variable to save your changes.

df_dropped = df.drop(columns = "personal rating")
df_dropped
movierelease datedomestic box officeworldwide box officeinternational box office
0The Truman Show1996-06-05125618201264118201138500000
1Rogue One: A Star Wars Story2016-12-165321773241055135598522958274
2Iron Man2008-05-02318604126585171547266567421
3Blade Runner1982-06-2532656328395358376879509
4Breakfast at Tiffany's1961-10-0595519049794721242817
Removing one column from a DataFrame

Removing Multiple Columns from a DataFrame

When dropping multiple columns, make sure to put the columns in a list with brackets and separate them with commas.

df_dropped_multi = df.drop(columns = ["personal rating", "release date"])
df_dropped_multi
moviedomestic box officeworldwide box officeinternational box office
0The Truman Show125618201264118201138500000
1Rogue One: A Star Wars Story5321773241055135598522958274
2Iron Man318604126585171547266567421
3Blade Runner32656328395358376879509
4Breakfast at Tiffany's95519049794721242817
Removing multiple columns from a DataFrame