Removing Rows from a DataFrame

The drop function is used to remove rows or columns from a pandas DataFrame.

To explore how to remove rows using this function, we'll be looking at a DataFrame of foods:

import pandas as pd\n&nbsp;\n# Creates a DataFrame with 'weight', 'length', and 'price' columns\n# Note: all columns use US measurements: ounces, inches, USD\ndf = pd.DataFrame([\n  {'food': 'cheesecake', 'weight': 3.43, 'length': 4, 'price': 6.38},\n  {'food': 'mochi', 'weight': 2, 'length': 1.2, 'price': 0.75},\n  {'food': 'donut', 'weight': 1.3, 'length': 4.8, 'price': 1.53},\n  {'food': 'churro', 'weight': 2.6, 'length': 10, 'price': 3.00},\n  {'food': 'cupcake', 'weight': 1.2, 'length': 2.5, 'price': 2.63},\n  {'food': 'flan', 'weight': 7.1, 'length': 7, 'price': 8.10},\n  {'food': 'egg tart', 'weight': 3, 'length': 2.5, 'price': 3.00},\n])\ndf

Reset Code Python Output:

	food	weight	length	price
0	cheesecake	3.43	4.0	6.38
1	mochi	2.00	1.2	0.75
2	donut	1.30	4.8	1.53
3	churro	2.60	10.0	3.00
4	cupcake	1.20	2.5	2.63
5	flan	7.10	7.0	8.10
6	egg tart	3.00	2.5	3.00

Removing a Single Row from a DataFrame

We can drop a row by passing in its row label as input. It's important to note that for this function, integers are treated as labels, not indices.

import pandas as pd\n&nbsp;\n# Creates a DataFrame with 'weight', 'length', and 'price' columns\n# Note: all columns use US measurements: ounces, inches, USD\ndf = pd.DataFrame([\n  {'food': 'cheesecake', 'weight': 3.43, 'length': 4, 'price': 6.38},\n  {'food': 'mochi', 'weight': 2, 'length': 1.2, 'price': 0.75},\n  {'food': 'donut', 'weight': 1.3, 'length': 4.8, 'price': 1.53},\n  {'food': 'churro', 'weight': 2.6, 'length': 10, 'price': 3.00},\n  {'food': 'cupcake', 'weight': 1.2, 'length': 2.5, 'price': 2.63},\n  {'food': 'flan', 'weight': 7.1, 'length': 7, 'price': 8.10},\n  {'food': 'egg tart', 'weight': 3, 'length': 2.5, 'price': 3.00},\n])\n# drops the 'donut' row\ndf.drop(2)

Reset Code Python Output:

	food	weight	length	price
0	cheesecake	3.43	4.0	6.38
1	mochi	2.00	1.2	0.75
3	churro	2.60	10.0	3.00
4	cupcake	1.20	2.5	2.63
5	flan	7.10	7.0	8.10
6	egg tart	3.00	2.5	3.00

import pandas as pd\n&nbsp;\n# Creates a DataFrame with 'weight', 'length', and 'price' columns\n# Note: all columns use US measurements: ounces, inches, USD\ndf = pd.DataFrame([\n  {'food': 'cheesecake', 'weight': 3.43, 'length': 4, 'price': 6.38},\n  {'food': 'mochi', 'weight': 2, 'length': 1.2, 'price': 0.75},\n  {'food': 'donut', 'weight': 1.3, 'length': 4.8, 'price': 1.53},\n  {'food': 'churro', 'weight': 2.6, 'length': 10, 'price': 3.00},\n  {'food': 'cupcake', 'weight': 1.2, 'length': 2.5, 'price': 2.63},\n  {'food': 'flan', 'weight': 7.1, 'length': 7, 'price': 8.10},\n  {'food': 'egg tart', 'weight': 3, 'length': 2.5, 'price': 3.00},\n])\n# drops the'cupcake' row\ndf.drop(4)

Reset Code Python Output:

	food	weight	length	price
0	cheesecake	3.43	4.0	6.38
1	mochi	2.00	1.2	0.75
2	donut	1.30	4.8	1.53
3	churro	2.60	10.0	3.00
5	flan	7.10	7.0	8.10
6	egg tart	3.00	2.5	3.00

Notice that in the output above, the row with label 2 that we dropped previously appears again. This is because the default of the drop function is to return a copy of the DataFrame with the dropped row. It doesn't drop the row from the original DataFrame.

To permanently drop rows from a DataFrame, we need to set inplace = true:

import pandas as pd\n&nbsp;\n# Creates a DataFrame with 'weight', 'length', and 'price' columns\n# Note: all columns use US measurements: ounces, inches, USD\ndf = pd.DataFrame([\n  {'food': 'cheesecake', 'weight': 3.43, 'length': 4, 'price': 6.38},\n  {'food': 'mochi', 'weight': 2, 'length': 1.2, 'price': 0.75},\n  {'food': 'donut', 'weight': 1.3, 'length': 4.8, 'price': 1.53},\n  {'food': 'churro', 'weight': 2.6, 'length': 10, 'price': 3.00},\n  {'food': 'cupcake', 'weight': 1.2, 'length': 2.5, 'price': 2.63},\n  {'food': 'flan', 'weight': 7.1, 'length': 7, 'price': 8.10},\n  {'food': 'egg tart', 'weight': 3, 'length': 2.5, 'price': 3.00},\n])\n# drops the 'cupcake' row\ndf.drop(4, inplace = True)\n&nbsp;\n# drops the 'donut' row\ndf.drop(2, inplace = True)\n&nbsp;\n# prints df\ndf

Reset Code Python Output:

	food	weight	length	price
0	cheesecake	3.43	4.0	6.38
1	mochi	2.00	1.2	0.75
3	churro	2.60	10.0	3.00
5	flan	7.10	7.0	8.10
6	egg tart	3.00	2.5	3.00

Removing Multiple Rows from a DataFrame

We can drop multiple rows at a time by inputting a list of labels:

import pandas as pd\n&nbsp;\n# Creates a DataFrame with 'weight', 'length', and 'price' columns\n# Note: all columns use US measurements: ounces, inches, USD\ndf = pd.DataFrame([\n  {'food': 'cheesecake', 'weight': 3.43, 'length': 4, 'price': 6.38},\n  {'food': 'mochi', 'weight': 2, 'length': 1.2, 'price': 0.75},\n  {'food': 'donut', 'weight': 1.3, 'length': 4.8, 'price': 1.53},\n  {'food': 'churro', 'weight': 2.6, 'length': 10, 'price': 3.00},\n  {'food': 'cupcake', 'weight': 1.2, 'length': 2.5, 'price': 2.63},\n  {'food': 'flan', 'weight': 7.1, 'length': 7, 'price': 8.10},\n  {'food': 'egg tart', 'weight': 3, 'length': 2.5, 'price': 3.00},\n])\ndf.drop([0,1])

Reset Code Python Output:

	food	weight	length	price
3	churro	2.6	10.0	3.0
5	flan	7.1	7.0	8.1
6	egg tart	3.0	2.5	3.0

Removing Rows from a DataFrame using Custom Labels

Let's say we're looking to find a new social media platform to use. To do so, we create a DataFrame of the most popular social medias:

import pandas as pd\n# Creates a DataFrame with 'users(mil.)' and 'year created' columns\ndf1 = pd.DataFrame([\n  {'name': 'Snapchat', 'users(mil.)': 493.7, 'created': 2011},\n  {'name': 'Tiktok', 'users(mil.)': 750, 'created': 2016},\n  {'name': 'Instagram', 'users(mil.)': 1280, 'created': 2010},\n  {'name': 'Twitter', 'users(mil.)': 229.0, 'created': 2006},\n  {'name': 'Facebook', 'users(mil.)': 2100.0, 'created': 2004}])\n&nbsp;\n# Set the row labels to be the `name` column:\ndf1.set_index('name', inplace = True)\ndf1

Reset Code Python Output:


  
    
      
      users(mil.)
      created
    
    
      name
      
      
    
  
  
    
      Snapchat
      493.7
      2011
    
    
      Tiktok
      750.0
      2016
    
    
      Instagram
      1280.0
      2010
    
    
      Twitter
      229.0
      2006
    
    
      Facebook
      2100.0
      2004

	users(mil.)	created
Snapchat	493.7	2011
Tiktok	750.0	2016
Instagram	1280.0	2010
Twitter	229.0	2006
Facebook	2100.0	2004

Now, let's say we hear on the news that Facebook is facing many data privacy lawsuits. We don't want our personal information leaked, so we make the decision to drop Facebook from our DataFrame:

import pandas as pd\ndf1 = pd.DataFrame([\n  {'name': 'Snapchat', 'users(mil.)': 493.7, 'created': 2011},\n  {'name': 'Tiktok', 'users(mil.)': 750, 'created': 2016},\n  {'name': 'Instagram', 'users(mil.)': 1280, 'created': 2010},\n  {'name': 'Twitter', 'users(mil.)': 229.0, 'created': 2006},\n  {'name': 'Facebook', 'users(mil.)': 2100.0, 'created': 2004}])\ndf1 = df1.set_index('name')\ndf1.drop('Facebook', inplace=True)\ndf1

Reset Code Python Output:


  
    
      
      users(mil.)
      created
    
    
      name
      
      
    
  
  
    
      Snapchat
      493.7
      2011
    
    
      Tiktok
      750.0
      2016
    
    
      Instagram
      1280.0
      2010
    
    
      Twitter
      229.0
      2006

	users(mil.)	created
Snapchat	493.7	2011
Tiktok	750.0	2016
Instagram	1280.0	2010
Twitter	229.0	2006

As mentioned in the first section, integers are treated as labels. Since the basic axis is set to indicies of the dataframe, inputting general fields into the drop function in this case will result in an error:

import pandas as pd\ndf1 = pd.DataFrame([\n  {'name': 'Snapchat', 'users(mil.)': 493.7, 'created': 2011},\n  {'name': 'Tiktok', 'users(mil.)': 750, 'created': 2016},\n  {'name': 'Instagram', 'users(mil.)': 1280, 'created': 2010},\n  {'name': 'Twitter', 'users(mil.)': 229.0, 'created': 2006},\n  {'name': 'Facebook', 'users(mil.)': 2100.0, 'created': 2004}])\ndf1.drop(['Tiktok', 'Instagram'])

Reset Code Python Output:

```
KeyError: ['Tiktok', 'Instagram'] not found in axis
```

Now, let's say that after some consideration we decide to drop Tiktok and Instagram because they're too addicting and time consuming. We can drop both rows by inputting a list of the labels:

import pandas as pd\ndf1 = pd.DataFrame([\n  {'name': 'Snapchat', 'users(mil.)': 493.7, 'created': 2011},\n  {'name': 'Tiktok', 'users(mil.)': 750, 'created': 2016},\n  {'name': 'Instagram', 'users(mil.)': 1280, 'created': 2010},\n  {'name': 'Twitter', 'users(mil.)': 229.0, 'created': 2006},\n  {'name': 'Facebook', 'users(mil.)': 2100.0, 'created': 2004}])\ndf1.set_index('name').drop(['Tiktok', 'Instagram', 'Facebook'])

Reset Code Python Output:


  
    
      
      users(mil.)
      created
    
    
      name
      
      
    
  
  
    
      Snapchat
      493.7
      2011
    
    
      Twitter
      229.0
      2006

	users(mil.)	created
Snapchat	493.7	2011
Twitter	229.0	2006

Pandas Documentation

Click Here for the pandas documentation for the drop function

	users(mil.)	created
name
Snapchat	493.7	2011
Tiktok	750.0	2016
Instagram	1280.0	2010
Twitter	229.0	2006
Facebook	2100.0	2004