Removing Rows from a DataFrame
The drop
function is used to remove rows or columns from a pandas DataFrame.
To explore how to remove rows using this function, we'll be looking at a DataFrame of foods:
import pandas as pd
# Creates a DataFrame with 'weight', 'length', and 'price' columns
# Note: all columns use US measurements: ounces, inches, USD
df = pd.DataFrame([
{'food': 'cheesecake', 'weight': 3.43, 'length': 4, 'price': 6.38},
{'food': 'mochi', 'weight': 2, 'length': 1.2, 'price': 0.75},
{'food': 'donut', 'weight': 1.3, 'length': 4.8, 'price': 1.53},
{'food': 'churro', 'weight': 2.6, 'length': 10, 'price': 3.00},
{'food': 'cupcake', 'weight': 1.2, 'length': 2.5, 'price': 2.63},
{'food': 'flan', 'weight': 7.1, 'length': 7, 'price': 8.10},
{'food': 'egg tart', 'weight': 3, 'length': 2.5, 'price': 3.00},
])
df
food | weight | length | price | |
---|---|---|---|---|
0 | cheesecake | 3.43 | 4.0 | 6.38 |
1 | mochi | 2.00 | 1.2 | 0.75 |
2 | donut | 1.30 | 4.8 | 1.53 |
3 | churro | 2.60 | 10.0 | 3.00 |
4 | cupcake | 1.20 | 2.5 | 2.63 |
5 | flan | 7.10 | 7.0 | 8.10 |
6 | egg tart | 3.00 | 2.5 | 3.00 |
Removing a Single Row from a DataFrame
We can drop a row by passing in its row label as input. It's important to note that for this function, integers are treated as labels, not indices.
# drops the 'donut' row
df.drop(2)
food | weight | length | price | |
---|---|---|---|---|
0 | cheesecake | 3.43 | 4.0 | 6.38 |
1 | mochi | 2.00 | 1.2 | 0.75 |
3 | churro | 2.60 | 10.0 | 3.00 |
4 | cupcake | 1.20 | 2.5 | 2.63 |
5 | flan | 7.10 | 7.0 | 8.10 |
6 | egg tart | 3.00 | 2.5 | 3.00 |
# drops the'cupcake' row
df.drop(4)
food | weight | length | price | |
---|---|---|---|---|
0 | cheesecake | 3.43 | 4.0 | 6.38 |
1 | mochi | 2.00 | 1.2 | 0.75 |
2 | donut | 1.30 | 4.8 | 1.53 |
3 | churro | 2.60 | 10.0 | 3.00 |
5 | flan | 7.10 | 7.0 | 8.10 |
6 | egg tart | 3.00 | 2.5 | 3.00 |
Notice that in the output above, the row with label 2 that we dropped previously appears again. This is because the default of the drop
function is to return a copy of the DataFrame with the dropped row. It doesn't drop the row from the original DataFrame.
To permanently drop rows from a DataFrame, we need to set inplace = true
:
# drops the 'cupcake' row
df.drop(4, inplace = True)
# drops the 'donut' row
df.drop(2, inplace = True)
# prints df
df
food | weight | length | price | |
---|---|---|---|---|
0 | cheesecake | 3.43 | 4.0 | 6.38 |
1 | mochi | 2.00 | 1.2 | 0.75 |
3 | churro | 2.60 | 10.0 | 3.00 |
5 | flan | 7.10 | 7.0 | 8.10 |
6 | egg tart | 3.00 | 2.5 | 3.00 |
Removing Multiple Rows from a DataFrame
We can drop multiple rows at a time by inputting a list of labels:
df.drop([0,1])
food | weight | length | price | |
---|---|---|---|---|
3 | churro | 2.6 | 10.0 | 3.0 |
5 | flan | 7.1 | 7.0 | 8.1 |
6 | egg tart | 3.0 | 2.5 | 3.0 |
Removing Rows from a DataFrame using Custom Labels
Let's say we're looking to find a new social media platform to use. To do so, we create a DataFrame of the most popular social medias:
# Creates a DataFrame with 'users(mil.)' and 'year created' columns
df1 = pd.DataFrame([
{'name': 'Snapchat', 'users(mil.)': 493.7, 'created': 2011},
{'name': 'Tiktok', 'users(mil.)': 750, 'created': 2016},
{'name': 'Instagram', 'users(mil.)': 1280, 'created': 2010},
{'name': 'Twitter', 'users(mil.)': 229.0, 'created': 2006},
{'name': 'Facebook', 'users(mil.)': 2100.0, 'created': 2004}])
# Set the row labels to be the `name` column:
df1.set_index('name', inplace = True)
df1
users(mil.) | created | |
---|---|---|
name | ||
Snapchat | 493.7 | 2011 |
Tiktok | 750.0 | 2016 |
1280.0 | 2010 | |
229.0 | 2006 | |
2100.0 | 2004 |
Now, let's say we hear on the news that Facebook is facing many data privacy lawsuits. We don't want our personal information leaked, so we make the decision to drop Facebook from our DataFrame:
df1.drop('Facebook', inplace = True)
df1
users(mil.) | created | |
---|---|---|
name | ||
Snapchat | 493.7 | 2011 |
Tiktok | 750.0 | 2016 |
1280.0 | 2010 | |
229.0 | 2006 |
As mentioned in the first section, integers are treated as labels. Since none of the labels in df1
are integers, inputting one into the drop
function in this case will result in an error:
Now, let's say that after some consideration we decide to drop Tiktok and Instagram because they're too addicting and time consuming. We can drop both rows by inputting a list of the labels:
df1.drop(['Tiktok', 'Instagram'])
users(mil.) | created | |
---|---|---|
name | ||
Snapchat | 493.7 | 2011 |
229.0 | 2006 |
Pandas Documentation
Click Here for the pandas documentation for the drop
function