🏠 Data Science Guides Select Rows From A DataFrame There are numerous ways to select rows from a DataFrame. One method is to select rows based on the content of its columns. To do this, we can use conditions.
For our example, let's explore a DataFrame of different pets:
import pandas as pd\n \n# Creating a DataFrame with 'name', 'weight(lb.)', 'lifespan(yr.)', and 'group' columns\ndf = pd.DataFrame([\n {'name': 'golden retriever', 'weight(lb.)': 70, 'lifespan(yr.)': 11, 'group': 'mammal'},\n {'name': 'ferret', 'weight(lb.)': 4.4, 'lifespan(yr.)': 7, 'group': 'mammal'},\n {'name': 'axolotl', 'weight(lb.)': 0.63, 'lifespan(yr.)': 12, 'group': 'amphibian'},\n {'name': 'bearded dragon', 'weight(lb.)': 1, 'lifespan(yr.)': 13, 'group': 'reptile'},\n {'name': 'frog', 'weight(lb.)': 0.8, 'lifespan(yr.)': 11, 'group': 'amphibian'},\n {'name': 'basilisk', 'weight(lb.)': 0.43, 'lifespan(yr.)': 10, 'group': 'reptile'},\n {'name': 'salamander', 'weight(lb.)': 0.44, 'lifespan(yr.)': 16, 'group': 'amphibian'},\n {'name': 'chinchilla', 'weight(lb.)': 1.8, 'lifespan(yr.)': 18, 'group': 'mammal'},\n {'name': 'goldfish', 'weight(lb.)': 8, 'lifespan(yr.)': 12, 'group': 'fish'},\n {'name': 'koi', 'weight(lb.)': 12, 'lifespan(yr.)': 30, 'group': 'fish'},\n {'name': 'gecko', 'weight(lb.)': 0.15, 'lifespan(yr.)': 15, 'group': 'reptile'},\n])\ndf Run Code
Reset Code Python Output:
name
weight(lb.)
lifespan(yr.)
group
0
golden retriever
70.00
11
mammal
1
ferret
4.40
7
mammal
2
axolotl
0.63
12
amphibian
3
bearded dragon
1.00
13
reptile
4
frog
0.80
11
amphibian
5
basilisk
0.43
10
reptile
6
salamander
0.44
16
amphibian
7
chinchilla
1.80
18
mammal
8
goldfish
8.00
12
fish
9
koi
12.00
30
fish
10
gecko
0.15
15
reptile
Condition Operators When using conditions, there are six primary comparison operators:
<
(strictly less than)>
(strictly greater than)<=
(less than or equal to)>=
(greater than or equal to)==
(exactly equal to)!=
(doesn't equal)When you use a conditional by itself, a Series of True
or False
values based on the truth of the conditional is given:
import pandas as pd\n \n# Creating a DataFrame with 'name', 'weight(lb.)', 'lifespan(yr.)', and 'group' columns\ndf = pd.DataFrame([\n {'name': 'golden retriever', 'weight(lb.)': 70, 'lifespan(yr.)': 11, 'group': 'mammal'},\n {'name': 'ferret', 'weight(lb.)': 4.4, 'lifespan(yr.)': 7, 'group': 'mammal'},\n {'name': 'axolotl', 'weight(lb.)': 0.63, 'lifespan(yr.)': 12, 'group': 'amphibian'},\n {'name': 'bearded dragon', 'weight(lb.)': 1, 'lifespan(yr.)': 13, 'group': 'reptile'},\n {'name': 'frog', 'weight(lb.)': 0.8, 'lifespan(yr.)': 11, 'group': 'amphibian'},\n {'name': 'basilisk', 'weight(lb.)': 0.43, 'lifespan(yr.)': 10, 'group': 'reptile'},\n {'name': 'salamander', 'weight(lb.)': 0.44, 'lifespan(yr.)': 16, 'group': 'amphibian'},\n {'name': 'chinchilla', 'weight(lb.)': 1.8, 'lifespan(yr.)': 18, 'group': 'mammal'},\n {'name': 'goldfish', 'weight(lb.)': 8, 'lifespan(yr.)': 12, 'group': 'fish'},\n {'name': 'koi', 'weight(lb.)': 12, 'lifespan(yr.)': 30, 'group': 'fish'},\n {'name': 'gecko', 'weight(lb.)': 0.15, 'lifespan(yr.)': 15, 'group': 'reptile'},\n])\n# checks if each value in the weight(lb.) column is strictly greater than 6\ndf['weight(lb.)'] > 6 Run Code
Reset Code Python Output:
```
0 True
1 False
2 False
3 False
4 False
5 False
6 False
7 False
8 True
9 True
10 False
Name: weight(lb.), dtype: bool
```
Row Selection With a Single Condition To select only rows that match one specific criteria, we can use a single condition.
For example, say were only interested in looking at amphibian pets:
import pandas as pd\n \n# Creating a DataFrame with 'name', 'weight(lb.)', 'lifespan(yr.)', and 'group' columns\ndf = pd.DataFrame([\n {'name': 'golden retriever', 'weight(lb.)': 70, 'lifespan(yr.)': 11, 'group': 'mammal'},\n {'name': 'ferret', 'weight(lb.)': 4.4, 'lifespan(yr.)': 7, 'group': 'mammal'},\n {'name': 'axolotl', 'weight(lb.)': 0.63, 'lifespan(yr.)': 12, 'group': 'amphibian'},\n {'name': 'bearded dragon', 'weight(lb.)': 1, 'lifespan(yr.)': 13, 'group': 'reptile'},\n {'name': 'frog', 'weight(lb.)': 0.8, 'lifespan(yr.)': 11, 'group': 'amphibian'},\n {'name': 'basilisk', 'weight(lb.)': 0.43, 'lifespan(yr.)': 10, 'group': 'reptile'},\n {'name': 'salamander', 'weight(lb.)': 0.44, 'lifespan(yr.)': 16, 'group': 'amphibian'},\n {'name': 'chinchilla', 'weight(lb.)': 1.8, 'lifespan(yr.)': 18, 'group': 'mammal'},\n {'name': 'goldfish', 'weight(lb.)': 8, 'lifespan(yr.)': 12, 'group': 'fish'},\n {'name': 'koi', 'weight(lb.)': 12, 'lifespan(yr.)': 30, 'group': 'fish'},\n {'name': 'gecko', 'weight(lb.)': 0.15, 'lifespan(yr.)': 15, 'group': 'reptile'},\n])\n# selects only rows whose 'group' column contains 'amphibian'\ndf[df['group'] == 'amphibian'] Run Code
Reset Code Python Output:
name
weight(lb.)
lifespan(yr.)
group
2
axolotl
0.63
12
amphibian
4
frog
0.80
11
amphibian
6
salamander
0.44
16
amphibian
Now, say we are only interested in smaller pets that weighed less than a pound:
import pandas as pd\n \n# Creating a DataFrame with 'name', 'weight(lb.)', 'lifespan(yr.)', and 'group' columns\ndf = pd.DataFrame([\n {'name': 'golden retriever', 'weight(lb.)': 70, 'lifespan(yr.)': 11, 'group': 'mammal'},\n {'name': 'ferret', 'weight(lb.)': 4.4, 'lifespan(yr.)': 7, 'group': 'mammal'},\n {'name': 'axolotl', 'weight(lb.)': 0.63, 'lifespan(yr.)': 12, 'group': 'amphibian'},\n {'name': 'bearded dragon', 'weight(lb.)': 1, 'lifespan(yr.)': 13, 'group': 'reptile'},\n {'name': 'frog', 'weight(lb.)': 0.8, 'lifespan(yr.)': 11, 'group': 'amphibian'},\n {'name': 'basilisk', 'weight(lb.)': 0.43, 'lifespan(yr.)': 10, 'group': 'reptile'},\n {'name': 'salamander', 'weight(lb.)': 0.44, 'lifespan(yr.)': 16, 'group': 'amphibian'},\n {'name': 'chinchilla', 'weight(lb.)': 1.8, 'lifespan(yr.)': 18, 'group': 'mammal'},\n {'name': 'goldfish', 'weight(lb.)': 8, 'lifespan(yr.)': 12, 'group': 'fish'},\n {'name': 'koi', 'weight(lb.)': 12, 'lifespan(yr.)': 30, 'group': 'fish'},\n {'name': 'gecko', 'weight(lb.)': 0.15, 'lifespan(yr.)': 15, 'group': 'reptile'},\n])\n# selects only rows whose 'weight(lb.)' column contains a value less than 1\ndf[df['weight(lb.)'] < 1] Run Code
Reset Code Python Output:
name
weight(lb.)
lifespan(yr.)
group
2
axolotl
0.63
12
amphibian
4
frog
0.80
11
amphibian
5
basilisk
0.43
10
reptile
6
salamander
0.44
16
amphibian
10
gecko
0.15
15
reptile
Additional explanations, videos, and example problems covering conditionals is part of the DISCOVERY course content found here:
Row Selection with Multiple Conditions It is possible to select rows that meet different criteria using multiple conditions by joining conditionals together with &
(AND) or |
(OR) logical operators. (Note: Python requires the use of parentheses around the conditionals when using multiple conditionals!)
For example, say we want a pet that lives longer than 10 years but less than 15 years.
import pandas as pd\n \n# Creating a DataFrame with 'name', 'weight(lb.)', 'lifespan(yr.)', and 'group' columns\ndf = pd.DataFrame([\n {'name': 'golden retriever', 'weight(lb.)': 70, 'lifespan(yr.)': 11, 'group': 'mammal'},\n {'name': 'ferret', 'weight(lb.)': 4.4, 'lifespan(yr.)': 7, 'group': 'mammal'},\n {'name': 'axolotl', 'weight(lb.)': 0.63, 'lifespan(yr.)': 12, 'group': 'amphibian'},\n {'name': 'bearded dragon', 'weight(lb.)': 1, 'lifespan(yr.)': 13, 'group': 'reptile'},\n {'name': 'frog', 'weight(lb.)': 0.8, 'lifespan(yr.)': 11, 'group': 'amphibian'},\n {'name': 'basilisk', 'weight(lb.)': 0.43, 'lifespan(yr.)': 10, 'group': 'reptile'},\n {'name': 'salamander', 'weight(lb.)': 0.44, 'lifespan(yr.)': 16, 'group': 'amphibian'},\n {'name': 'chinchilla', 'weight(lb.)': 1.8, 'lifespan(yr.)': 18, 'group': 'mammal'},\n {'name': 'goldfish', 'weight(lb.)': 8, 'lifespan(yr.)': 12, 'group': 'fish'},\n {'name': 'koi', 'weight(lb.)': 12, 'lifespan(yr.)': 30, 'group': 'fish'},\n {'name': 'gecko', 'weight(lb.)': 0.15, 'lifespan(yr.)': 15, 'group': 'reptile'},\n])\n# selecting rows whose data in the 'lifespan(yr.)' column is greater than 10 and less than 15\ndf[(df['lifespan(yr.)'] > 10) & (df['lifespan(yr.)'] < 15)] Run Code
Reset Code Python Output:
name
weight(lb.)
lifespan(yr.)
group
0
golden retriever
70.00
11
mammal
2
axolotl
0.63
12
amphibian
3
bearded dragon
1.00
13
reptile
4
frog
0.80
11
amphibian
8
goldfish
8.00
12
fish
Row Section with Mixed Logical Operators Now, say we wanted to look at pets that is either a mammal or an amphibian, and lives more than 12 years.
We have 3 conditions:
df['group'] == 'amphibian'
df['group'] == 'mammal'
df['lifespan(yr.)'] > 12
But, notice the difference in output when these conditions are arranged differently:
This code first checks if the row's group
column contains the word 'amphibian' . If not, then it checks if the row contains the word 'mammal' in the group
column and contains a value greater than 12 in the lifespan(yr.)
column.
import pandas as pd\n \n# Creating a DataFrame with 'name', 'weight(lb.)', 'lifespan(yr.)', and 'group' columns\ndf = pd.DataFrame([\n {'name': 'golden retriever', 'weight(lb.)': 70, 'lifespan(yr.)': 11, 'group': 'mammal'},\n {'name': 'ferret', 'weight(lb.)': 4.4, 'lifespan(yr.)': 7, 'group': 'mammal'},\n {'name': 'axolotl', 'weight(lb.)': 0.63, 'lifespan(yr.)': 12, 'group': 'amphibian'},\n {'name': 'bearded dragon', 'weight(lb.)': 1, 'lifespan(yr.)': 13, 'group': 'reptile'},\n {'name': 'frog', 'weight(lb.)': 0.8, 'lifespan(yr.)': 11, 'group': 'amphibian'},\n {'name': 'basilisk', 'weight(lb.)': 0.43, 'lifespan(yr.)': 10, 'group': 'reptile'},\n {'name': 'salamander', 'weight(lb.)': 0.44, 'lifespan(yr.)': 16, 'group': 'amphibian'},\n {'name': 'chinchilla', 'weight(lb.)': 1.8, 'lifespan(yr.)': 18, 'group': 'mammal'},\n {'name': 'goldfish', 'weight(lb.)': 8, 'lifespan(yr.)': 12, 'group': 'fish'},\n {'name': 'koi', 'weight(lb.)': 12, 'lifespan(yr.)': 30, 'group': 'fish'},\n {'name': 'gecko', 'weight(lb.)': 0.15, 'lifespan(yr.)': 15, 'group': 'reptile'},\n])\n# putting the `lifespan` condition last\ndf[(df['group'] == 'amphibian') | (df['group'] == 'mammal') & (df['lifespan(yr.)'] > 12)] Run Code
Reset Code Python Output:
name
weight(lb.)
lifespan(yr.)
group
2
axolotl
0.63
12
amphibian
4
frog
0.80
11
amphibian
6
salamander
0.44
16
amphibian
7
chinchilla
1.80
18
mammal
This code first checks if the row's lifespan(yr.)
column contains a value greater than 12 and its group
column contains the word 'amphibian' . If not, then it checks if the row's group
column contains the word 'mammal' .
import pandas as pd\n \n# Creating a DataFrame with 'name', 'weight(lb.)', 'lifespan(yr.)', and 'group' columns\ndf = pd.DataFrame([\n {'name': 'golden retriever', 'weight(lb.)': 70, 'lifespan(yr.)': 11, 'group': 'mammal'},\n {'name': 'ferret', 'weight(lb.)': 4.4, 'lifespan(yr.)': 7, 'group': 'mammal'},\n {'name': 'axolotl', 'weight(lb.)': 0.63, 'lifespan(yr.)': 12, 'group': 'amphibian'},\n {'name': 'bearded dragon', 'weight(lb.)': 1, 'lifespan(yr.)': 13, 'group': 'reptile'},\n {'name': 'frog', 'weight(lb.)': 0.8, 'lifespan(yr.)': 11, 'group': 'amphibian'},\n {'name': 'basilisk', 'weight(lb.)': 0.43, 'lifespan(yr.)': 10, 'group': 'reptile'},\n {'name': 'salamander', 'weight(lb.)': 0.44, 'lifespan(yr.)': 16, 'group': 'amphibian'},\n {'name': 'chinchilla', 'weight(lb.)': 1.8, 'lifespan(yr.)': 18, 'group': 'mammal'},\n {'name': 'goldfish', 'weight(lb.)': 8, 'lifespan(yr.)': 12, 'group': 'fish'},\n {'name': 'koi', 'weight(lb.)': 12, 'lifespan(yr.)': 30, 'group': 'fish'},\n {'name': 'gecko', 'weight(lb.)': 0.15, 'lifespan(yr.)': 15, 'group': 'reptile'},\n])\n# putting the `lifespan` condition first\ndf[(df['lifespan(yr.)'] > 12) & (df['group'] == 'amphibian') | (df['group'] == 'mammal')] Run Code
Reset Code Python Output:
name
weight(lb.)
lifespan(yr.)
group
0
golden retriever
70.00
11
mammal
1
ferret
4.40
7
mammal
6
salamander
0.44
16
amphibian
7
chinchilla
1.80
18
mammal
This line of code checks each row for a value greater than 12 in the lifespan(yr.)
column and that its group
column contains either amphibian or mammal .
import pandas as pd\n \n# Creating a DataFrame with 'name', 'weight(lb.)', 'lifespan(yr.)', and 'group' columns\ndf = pd.DataFrame([\n {'name': 'golden retriever', 'weight(lb.)': 70, 'lifespan(yr.)': 11, 'group': 'mammal'},\n {'name': 'ferret', 'weight(lb.)': 4.4, 'lifespan(yr.)': 7, 'group': 'mammal'},\n {'name': 'axolotl', 'weight(lb.)': 0.63, 'lifespan(yr.)': 12, 'group': 'amphibian'},\n {'name': 'bearded dragon', 'weight(lb.)': 1, 'lifespan(yr.)': 13, 'group': 'reptile'},\n {'name': 'frog', 'weight(lb.)': 0.8, 'lifespan(yr.)': 11, 'group': 'amphibian'},\n {'name': 'basilisk', 'weight(lb.)': 0.43, 'lifespan(yr.)': 10, 'group': 'reptile'},\n {'name': 'salamander', 'weight(lb.)': 0.44, 'lifespan(yr.)': 16, 'group': 'amphibian'},\n {'name': 'chinchilla', 'weight(lb.)': 1.8, 'lifespan(yr.)': 18, 'group': 'mammal'},\n {'name': 'goldfish', 'weight(lb.)': 8, 'lifespan(yr.)': 12, 'group': 'fish'},\n {'name': 'koi', 'weight(lb.)': 12, 'lifespan(yr.)': 30, 'group': 'fish'},\n {'name': 'gecko', 'weight(lb.)': 0.15, 'lifespan(yr.)': 15, 'group': 'reptile'},\n])\n# The same as the code above, except there are parenthesis surrounding the 2nd and 3rd condition\ndf[(df['lifespan(yr.)'] > 12) & ((df['group'] == 'amphibian') | (df['group'] == 'mammal'))] Run Code
Reset Code Python Output:
name
weight(lb.)
lifespan(yr.)
group
6
salamander
0.44
16
amphibian
7
chinchilla
1.80
18
mammal
Notice that the order of the conditions, the placement of parenthesis, and the use of the logical operators change the output.
Row Selection with Five Conditionals Finally, when selecting rows from a DataFrame, we can add as many conditions as we want:
import pandas as pd\n \n# Creating a DataFrame with 'name', 'weight(lb.)', 'lifespan(yr.)', and 'group' columns\ndf = pd.DataFrame([\n {'name': 'golden retriever', 'weight(lb.)': 70, 'lifespan(yr.)': 11, 'group': 'mammal'},\n {'name': 'ferret', 'weight(lb.)': 4.4, 'lifespan(yr.)': 7, 'group': 'mammal'},\n {'name': 'axolotl', 'weight(lb.)': 0.63, 'lifespan(yr.)': 12, 'group': 'amphibian'},\n {'name': 'bearded dragon', 'weight(lb.)': 1, 'lifespan(yr.)': 13, 'group': 'reptile'},\n {'name': 'frog', 'weight(lb.)': 0.8, 'lifespan(yr.)': 11, 'group': 'amphibian'},\n {'name': 'basilisk', 'weight(lb.)': 0.43, 'lifespan(yr.)': 10, 'group': 'reptile'},\n {'name': 'salamander', 'weight(lb.)': 0.44, 'lifespan(yr.)': 16, 'group': 'amphibian'},\n {'name': 'chinchilla', 'weight(lb.)': 1.8, 'lifespan(yr.)': 18, 'group': 'mammal'},\n {'name': 'goldfish', 'weight(lb.)': 8, 'lifespan(yr.)': 12, 'group': 'fish'},\n {'name': 'koi', 'weight(lb.)': 12, 'lifespan(yr.)': 30, 'group': 'fish'},\n {'name': 'gecko', 'weight(lb.)': 0.15, 'lifespan(yr.)': 15, 'group': 'reptile'},\n])\n# selecting pets that live more than 10 but at most 16 years, is a mammal or a reptile, and weighs more than 0.5 lbs.\ndf[(df['lifespan(yr.)'] > 10) &\n (df['lifespan(yr.)'] <= 16) &\n ( (df['group'] == 'mammal') | (df['group'] == 'reptile') ) &\n (df['weight(lb.)'] > 0.5)] Run Code
Reset Code Python Output:
name
weight(lb.)
lifespan(yr.)
group
0
golden retriever
70.0
11
mammal
3
bearded dragon
1.0
13
reptile
An explanation of how AND and OR operators work, including videos, example problems, and more details is part of the DISCOVERY course content: