Removing Rows from a DataFrame


The drop function is used to remove rows or columns from a pandas DataFrame.

To explore how to remove rows using this function, we'll be looking at a DataFrame of foods:

import pandas as pd

# Creates a DataFrame with 'weight', 'length', and 'price' columns
# Note: all columns use US measurements: ounces, inches, USD
df = pd.DataFrame([
  {'food': 'cheesecake', 'weight': 3.43, 'length': 4, 'price': 6.38},
  {'food': 'mochi', 'weight': 2, 'length': 1.2, 'price': 0.75},
  {'food': 'donut', 'weight': 1.3, 'length': 4.8, 'price': 1.53},
  {'food': 'churro', 'weight': 2.6, 'length': 10, 'price': 3.00}, 
  {'food': 'cupcake', 'weight': 1.2, 'length': 2.5, 'price': 2.63}, 
  {'food': 'flan', 'weight': 7.1, 'length': 7, 'price': 8.10},
  {'food': 'egg tart', 'weight': 3, 'length': 2.5, 'price': 3.00},
])
df
foodweightlengthprice
0cheesecake3.434.06.38
1mochi2.001.20.75
2donut1.304.81.53
3churro2.6010.03.00
4cupcake1.202.52.63
5flan7.107.08.10
6egg tart3.002.53.00
Creating a DataFrame of Foods

Removing a Single Row from a DataFrame

We can drop a row by passing in its row label as input. It's important to note that for this function, integers are treated as labels, not indices.

# drops the 'donut' row
df.drop(2)
foodweightlengthprice
0cheesecake3.434.06.38
1mochi2.001.20.75
3churro2.6010.03.00
4cupcake1.202.52.63
5flan7.107.08.10
6egg tart3.002.53.00
Drop a Row By Label
# drops the'cupcake' row
df.drop(4)
foodweightlengthprice
0cheesecake3.434.06.38
1mochi2.001.20.75
2donut1.304.81.53
3churro2.6010.03.00
5flan7.107.08.10
6egg tart3.002.53.00
Drop a Row By Label

Notice that in the output above, the row with label 2 that we dropped previously appears again. This is because the default of the drop function is to return a copy of the DataFrame with the dropped row. It doesn't drop the row from the original DataFrame.

To permanently drop rows from a DataFrame, we need to set inplace = true:

# drops the 'cupcake' row
df.drop(4, inplace = True)

# drops the 'donut' row
df.drop(2, inplace = True)

# prints df
df
foodweightlengthprice
0cheesecake3.434.06.38
1mochi2.001.20.75
3churro2.6010.03.00
5flan7.107.08.10
6egg tart3.002.53.00
Permanently Dropping Rows From a DataFrame

Removing Multiple Rows from a DataFrame

We can drop multiple rows at a time by inputting a list of labels:

df.drop([0,1])
foodweightlengthprice
3churro2.610.03.0
5flan7.17.08.1
6egg tart3.02.53.0
Dropping Multiple Rows

Removing Rows from a DataFrame using Custom Labels

Let's say we're looking to find a new social media platform to use. To do so, we create a DataFrame of the most popular social medias:

# Creates a DataFrame with 'users(mil.)' and 'year created' columns
df1 = pd.DataFrame([
  {'name': 'Snapchat', 'users(mil.)': 493.7, 'created': 2011}, 
  {'name': 'Tiktok', 'users(mil.)': 750, 'created': 2016}, 
  {'name': 'Instagram', 'users(mil.)': 1280, 'created': 2010}, 
  {'name': 'Twitter', 'users(mil.)': 229.0, 'created': 2006}, 
  {'name': 'Facebook', 'users(mil.)': 2100.0, 'created': 2004}])

# Set the row labels to be the `name` column:
df1.set_index('name', inplace = True)
df1
users(mil.)created
name
Snapchat493.72011
Tiktok750.02016
Instagram1280.02010
Twitter229.02006
Facebook2100.02004
Creating a DataFrame of Social Medias

Now, let's say we hear on the news that Facebook is facing many data privacy lawsuits. We don't want our personal information leaked, so we make the decision to drop Facebook from our DataFrame:

df1.drop('Facebook', inplace = True)
df1
users(mil.)created
name
Snapchat493.72011
Tiktok750.02016
Instagram1280.02010
Twitter229.02006
Dropping A Single Row With A Label

As mentioned in the first section, integers are treated as labels. Since none of the labels in df1 are integers, inputting one into the drop function in this case will result in an error:

df1.drop(3)
KeyError: [3] not found in axis
Inputting an Integer

Now, let's say that after some consideration we decide to drop Tiktok and Instagram because they're too addicting and time consuming. We can drop both rows by inputting a list of the labels:

df1.drop(['Tiktok', 'Instagram'])
users(mil.)created
name
Snapchat493.72011
Twitter229.02006
Dropping Multiple Rows

Pandas Documentation

Click Here for the pandas documentation for the drop function