Select Rows From A DataFrame


There are numerous ways to select rows from a DataFrame. One method is to select rows based on the content of its columns. To do this, we can use conditions.

For our example, let's explore a DataFrame of different pets:

import pandas as pd

# Creating a DataFrame with 'name', 'weight(lb.)', 'lifespan(yr.)', and 'group' columns
df = pd.DataFrame([
  {'name': 'golden retriever', 'weight(lb.)': 70, 'lifespan(yr.)': 11, 'group': 'mammal'}, 
  {'name': 'ferret', 'weight(lb.)': 4.4, 'lifespan(yr.)': 7, 'group': 'mammal'},
  {'name': 'axolotl', 'weight(lb.)': 0.63, 'lifespan(yr.)': 12, 'group': 'amphibian'}, 
  {'name': 'bearded dragon', 'weight(lb.)': 1, 'lifespan(yr.)': 13, 'group': 'reptile'}, 
  {'name': 'frog', 'weight(lb.)': 0.8, 'lifespan(yr.)': 11, 'group': 'amphibian'},
  {'name': 'basilisk', 'weight(lb.)': 0.43, 'lifespan(yr.)': 10, 'group': 'reptile'},
  {'name': 'salamander', 'weight(lb.)': 0.44, 'lifespan(yr.)': 16, 'group': 'amphibian'},
  {'name': 'chinchilla', 'weight(lb.)': 1.8, 'lifespan(yr.)': 18, 'group': 'mammal'},
  {'name': 'goldfish', 'weight(lb.)': 8, 'lifespan(yr.)': 12, 'group': 'fish'},
  {'name': 'koi', 'weight(lb.)': 12, 'lifespan(yr.)': 30, 'group': 'fish'},
  {'name': 'gecko', 'weight(lb.)': 0.15, 'lifespan(yr.)': 15, 'group': 'reptile'},
])
df
nameweight(lb.)lifespan(yr.)group
0golden retriever70.0011mammal
1ferret4.407mammal
2axolotl0.6312amphibian
3bearded dragon1.0013reptile
4frog0.8011amphibian
5basilisk0.4310reptile
6salamander0.4416amphibian
7chinchilla1.8018mammal
8goldfish8.0012fish
9koi12.0030fish
10gecko0.1515reptile
Creating a DataFrame of Pets

Condition Operators

When using conditions, there are six primary comparison operators:

  • < (strictly less than)
  • > (strictly greater than)
  • <= (less than or equal to)
  • >= (greater than or equal to)
  • == (exactly equal to)
  • != (doesn't equal)

When you use a conditional by itself, a Series of True or False values based on the truth of the conditional is given:

# checks if each value in the weight(lb.) column is strictly greater than 6
df['weight(lb.)'] > 6
0      True
1     False
2     False
3     False
4     False
5     False
6     False
7     False
8      True
9      True
10    False
Name: weight(lb.), dtype: bool
Conditions Example

Row Selection With a Single Condition

To select only rows that match one specific criteria, we can use a single condition.

For example, say were only interested in looking at amphibian pets:

# selects only rows whose 'group' column contains 'amphibian'
df[df['group'] == 'amphibian']
nameweight(lb.)lifespan(yr.)group
2axolotl0.6312amphibian
4frog0.8011amphibian
6salamander0.4416amphibian
Selecting Rows With One Condition

Now, say we are only interested in smaller pets that weighed less than a pound:

# selects only rows whose 'weight(lb.)' column contains a value less than 1
df[df['weight(lb.)'] < 1]
nameweight(lb.)lifespan(yr.)group
2axolotl0.6312amphibian
4frog0.8011amphibian
5basilisk0.4310reptile
6salamander0.4416amphibian
10gecko0.1515reptile
Selecting Rows With One Condition

Additional explanations, videos, and example problems covering conditionals is part of the DISCOVERY course content found here:

Row Selection with Multiple Conditions

It is possible to select rows that meet different criteria using multiple conditions by joining conditionals together with & (AND) or | (OR) logical operators. (Note: Python requires the use of parentheses around the conditionals when using multiple conditionals!)

For example, say we want a pet that lives longer than 10 years but less than 15 years.

# selecting rows whose data in the 'lifespan(yr.)' column is greater than 10 and less than 15
df[(df['lifespan(yr.)'] > 10) & (df['lifespan(yr.)'] < 15)]
nameweight(lb.)lifespan(yr.)group
0golden retriever70.0011mammal
2axolotl0.6312amphibian
3bearded dragon1.0013reptile
4frog0.8011amphibian
8goldfish8.0012fish
Selecting Rows With Multiple Conditions

Row Section with Mixed Logical Operators

Now, say we wanted to look at pets that is either a mammal or an amphibian, and lives more than 12 years.

We have 3 conditions:

  • df['group'] == 'amphibian'
  • df['group'] == 'mammal'
  • df['lifespan(yr.)'] > 12

But, notice the difference in output when these conditions are arranged differently:

This code first checks if the row's group column contains the word 'amphibian'.
If not, then it checks if the row contains the word 'mammal' in the group column and contains a value greater than 12 in the lifespan(yr.) column.
# putting the `lifespan` condition last
df[(df['group'] == 'amphibian') | (df['group'] == 'mammal') & (df['lifespan(yr.)'] > 12)]
nameweight(lb.)lifespan(yr.)group
2axolotl0.6312amphibian
4frog0.8011amphibian
6salamander0.4416amphibian
7chinchilla1.8018mammal

This code first checks if the row's lifespan(yr.) column contains a value greater than 12 and its group column contains the word 'amphibian'. If not, then it checks if the row's group column contains the word 'mammal'.
# putting the `lifespan` condition first
df[(df['lifespan(yr.)'] > 12) & (df['group'] == 'amphibian') | (df['group'] == 'mammal')]
nameweight(lb.)lifespan(yr.)group
0golden retriever70.0011mammal
1ferret4.407mammal
6salamander0.4416amphibian
7chinchilla1.8018mammal

This line of code checks each row for a value greater than 12 in the lifespan(yr.) column and that its group column contains either amphibian or mammal.
# The same as the code above, except there are parenthesis surrounding the 2nd and 3rd condition
df[(df['lifespan(yr.)'] > 12) & ((df['group'] == 'amphibian') | (df['group'] == 'mammal'))]
nameweight(lb.)lifespan(yr.)group
6salamander0.4416amphibian
7chinchilla1.8018mammal

Notice that the order of the conditions, the placement of parenthesis, and the use of the logical operators change the output.

Row Selection with Five Conditionals

Finally, when selecting rows from a DataFrame, we can add as many conditions as we want:

# selecting pets that live more than 10 but at most 16 years, is a mammal or a reptile, and weighs more than 0.5 lbs.
df[(df['lifespan(yr.)'] > 10) &
   (df['lifespan(yr.)'] <= 16) &
   ( (df['group'] == 'mammal') | (df['group'] == 'reptile') ) &
   (df['weight(lb.)'] > 0.5)]
nameweight(lb.)lifespan(yr.)group
0golden retriever70.011mammal
3bearded dragon1.013reptile
Selecting Rows With 5 Conditions

An explanation of how AND and OR operators work, including videos, example problems, and more details is part of the DISCOVERY course content: