Select Rows From A DataFrame
There are numerous ways to select rows from a DataFrame. One method is to select rows based on the content of its columns. To do this, we can use conditions.
For our example, let's explore a DataFrame of different pets:
import pandas as pd
# Creating a DataFrame with 'name', 'weight(lb.)', 'lifespan(yr.)', and 'group' columns
df = pd.DataFrame([
{'name': 'golden retriever', 'weight(lb.)': 70, 'lifespan(yr.)': 11, 'group': 'mammal'},
{'name': 'ferret', 'weight(lb.)': 4.4, 'lifespan(yr.)': 7, 'group': 'mammal'},
{'name': 'axolotl', 'weight(lb.)': 0.63, 'lifespan(yr.)': 12, 'group': 'amphibian'},
{'name': 'bearded dragon', 'weight(lb.)': 1, 'lifespan(yr.)': 13, 'group': 'reptile'},
{'name': 'frog', 'weight(lb.)': 0.8, 'lifespan(yr.)': 11, 'group': 'amphibian'},
{'name': 'basilisk', 'weight(lb.)': 0.43, 'lifespan(yr.)': 10, 'group': 'reptile'},
{'name': 'salamander', 'weight(lb.)': 0.44, 'lifespan(yr.)': 16, 'group': 'amphibian'},
{'name': 'chinchilla', 'weight(lb.)': 1.8, 'lifespan(yr.)': 18, 'group': 'mammal'},
{'name': 'goldfish', 'weight(lb.)': 8, 'lifespan(yr.)': 12, 'group': 'fish'},
{'name': 'koi', 'weight(lb.)': 12, 'lifespan(yr.)': 30, 'group': 'fish'},
{'name': 'gecko', 'weight(lb.)': 0.15, 'lifespan(yr.)': 15, 'group': 'reptile'},
])
df
name | weight(lb.) | lifespan(yr.) | group | |
---|---|---|---|---|
0 | golden retriever | 70.00 | 11 | mammal |
1 | ferret | 4.40 | 7 | mammal |
2 | axolotl | 0.63 | 12 | amphibian |
3 | bearded dragon | 1.00 | 13 | reptile |
4 | frog | 0.80 | 11 | amphibian |
5 | basilisk | 0.43 | 10 | reptile |
6 | salamander | 0.44 | 16 | amphibian |
7 | chinchilla | 1.80 | 18 | mammal |
8 | goldfish | 8.00 | 12 | fish |
9 | koi | 12.00 | 30 | fish |
10 | gecko | 0.15 | 15 | reptile |
Condition Operators
When using conditions, there are six primary comparison operators:
<
(strictly less than)>
(strictly greater than)<=
(less than or equal to)>=
(greater than or equal to)==
(exactly equal to)!=
(doesn't equal)
When you use a conditional by itself, a Series of True
or False
values based on the truth of the conditional is given:
Row Selection With a Single Condition
To select only rows that match one specific criteria, we can use a single condition.
For example, say were only interested in looking at amphibian pets:
# selects only rows whose 'group' column contains 'amphibian'
df[df['group'] == 'amphibian']
name | weight(lb.) | lifespan(yr.) | group | |
---|---|---|---|---|
2 | axolotl | 0.63 | 12 | amphibian |
4 | frog | 0.80 | 11 | amphibian |
6 | salamander | 0.44 | 16 | amphibian |
Now, say we are only interested in smaller pets that weighed less than a pound:
# selects only rows whose 'weight(lb.)' column contains a value less than 1
df[df['weight(lb.)'] < 1]
name | weight(lb.) | lifespan(yr.) | group | |
---|---|---|---|---|
2 | axolotl | 0.63 | 12 | amphibian |
4 | frog | 0.80 | 11 | amphibian |
5 | basilisk | 0.43 | 10 | reptile |
6 | salamander | 0.44 | 16 | amphibian |
10 | gecko | 0.15 | 15 | reptile |
Additional explanations, videos, and example problems covering conditionals is part of the DISCOVERY course content found here:
Row Selection with Multiple Conditions
It is possible to select rows that meet different criteria using multiple conditions by joining conditionals together with &
(AND) or |
(OR) logical operators. (Note: Python requires the use of parentheses around the conditionals when using multiple conditionals!)
For example, say we want a pet that lives longer than 10 years but less than 15 years.
# selecting rows whose data in the 'lifespan(yr.)' column is greater than 10 and less than 15
df[(df['lifespan(yr.)'] > 10) & (df['lifespan(yr.)'] < 15)]
name | weight(lb.) | lifespan(yr.) | group | |
---|---|---|---|---|
0 | golden retriever | 70.00 | 11 | mammal |
2 | axolotl | 0.63 | 12 | amphibian |
3 | bearded dragon | 1.00 | 13 | reptile |
4 | frog | 0.80 | 11 | amphibian |
8 | goldfish | 8.00 | 12 | fish |
Row Section with Mixed Logical Operators
Now, say we wanted to look at pets that is either a mammal or an amphibian, and lives more than 12 years.
We have 3 conditions:
df['group'] == 'amphibian'
df['group'] == 'mammal'
df['lifespan(yr.)'] > 12
But, notice the difference in output when these conditions are arranged differently:
group
column contains the word 'amphibian'.If not, then it checks if the row contains the word 'mammal' in the
group
column and contains a value greater than 12 in the lifespan(yr.)
column.# putting the `lifespan` condition last
df[(df['group'] == 'amphibian') | (df['group'] == 'mammal') & (df['lifespan(yr.)'] > 12)]
name | weight(lb.) | lifespan(yr.) | group | |
---|---|---|---|---|
2 | axolotl | 0.63 | 12 | amphibian |
4 | frog | 0.80 | 11 | amphibian |
6 | salamander | 0.44 | 16 | amphibian |
7 | chinchilla | 1.80 | 18 | mammal |
lifespan(yr.)
column contains a value greater than 12 and its group
column contains the word 'amphibian'. If not, then it checks if the row's group
column contains the word 'mammal'.# putting the `lifespan` condition first
df[(df['lifespan(yr.)'] > 12) & (df['group'] == 'amphibian') | (df['group'] == 'mammal')]
name | weight(lb.) | lifespan(yr.) | group | |
---|---|---|---|---|
0 | golden retriever | 70.00 | 11 | mammal |
1 | ferret | 4.40 | 7 | mammal |
6 | salamander | 0.44 | 16 | amphibian |
7 | chinchilla | 1.80 | 18 | mammal |
lifespan(yr.)
column and that its group
column contains either amphibian or mammal.# The same as the code above, except there are parenthesis surrounding the 2nd and 3rd condition
df[(df['lifespan(yr.)'] > 12) & ((df['group'] == 'amphibian') | (df['group'] == 'mammal'))]
name | weight(lb.) | lifespan(yr.) | group | |
---|---|---|---|---|
6 | salamander | 0.44 | 16 | amphibian |
7 | chinchilla | 1.80 | 18 | mammal |
Notice that the order of the conditions, the placement of parenthesis, and the use of the logical operators change the output.
Row Selection with Five Conditionals
Finally, when selecting rows from a DataFrame, we can add as many conditions as we want:
# selecting pets that live more than 10 but at most 16 years, is a mammal or a reptile, and weighs more than 0.5 lbs.
df[(df['lifespan(yr.)'] > 10) &
(df['lifespan(yr.)'] <= 16) &
( (df['group'] == 'mammal') | (df['group'] == 'reptile') ) &
(df['weight(lb.)'] > 0.5)]
name | weight(lb.) | lifespan(yr.) | group | |
---|---|---|---|---|
0 | golden retriever | 70.0 | 11 | mammal |
3 | bearded dragon | 1.0 | 13 | reptile |
An explanation of how AND and OR operators work, including videos, example problems, and more details is part of the DISCOVERY course content: