Finding Minimum and Maximum Values in a DataFrame Column
It's often helpful to know a few specific values for each column (aka variable) in a DataFrame — mainly the highest value, lowest value, and all unique values. But sometimes, our DataFrame has so many rows, it's impossible to view the whole thing at once. Luckily, there are some pandas functions that allow us to easily locate this information.
The Movie Dataset
To demonstrate these functions, we'll use a DataFrame of five different movies, including information about their release date, how much money they made in US dollars, and a personal rating out of 10.
import pandas as pd
#Creates a DataFrame of "movie", "release date", "domestic gross", "worldwide gross", "personal rating", and "international box office" columns
df = pd.DataFrame([
{"movie": "The Truman Show", "release date": "1996-06-05", "domestic box office": 125618201, "worldwide box office": 264118201, "personal rating": 10, "international box office": 138500000},
{"movie": "Rogue One: A Star Wars Story", "release date": "2016-12-16", "domestic box office": 532177324, "worldwide box office": 1055135598, "personal rating": 9, "international box office": 522958274},
{"movie": "Iron Man", "release date": "2008-05-02", "domestic box office": 318604126, "worldwide box office": 585171547, "personal rating": 7, "international box office": 266567421},
{"movie": "Blade Runner", "release date": "1982-06-25", "domestic box office": 32656328, "worldwide box office": 39535837, "personal rating": 8, "international box office": 6879509},
{"movie": "Breakfast at Tiffany's", "release date": "1961-10-05", "domestic box office": 9551904, "worldwide box office": 9794721, "personal rating": 7, "international box office": 242817}
])
df
movie | release date | domestic box office | worldwide box office | personal rating | international box office | |
---|---|---|---|---|---|---|
0 | The Truman Show | 1996-06-05 | 125618201 | 264118201 | 10 | 138500000 |
1 | Rogue One: A Star Wars Story | 2016-12-16 | 532177324 | 1055135598 | 9 | 522958274 |
2 | Iron Man | 2008-05-02 | 318604126 | 585171547 | 7 | 266567421 |
3 | Blade Runner | 1982-06-25 | 32656328 | 39535837 | 8 | 6879509 |
4 | Breakfast at Tiffany's | 1961-10-05 | 9551904 | 9794721 | 7 | 242817 |
Finding the Rows with the Largest or Smallest Value
Often, you will be interested in the rows with the smallest or largest values for a specific column. The nlargest
and nsmallest
functions allow us to find the rows that have the largest or smallest values for a specific columns.
For example, we may want the three movies with the largest domestic box office sales:
df.nlargest(3, 'domestic box office')
movie | release date | domestic box office | worldwide box office | personal rating | international box office | |
---|---|---|---|---|---|---|
1 | Rogue One: A Star Wars Story | 2016-12-16 | 532177324 | 1055135598 | 9 | 522958274 |
2 | Iron Man | 2008-05-02 | 318604126 | 585171547 | 7 | 266567421 |
0 | The Truman Show | 1996-06-05 | 125618201 | 264118201 | 10 | 138500000 |
Another option is to see the two movies that the 'personal rating'
is the lowest.
df.nsmallest(2, 'personal rating')
movie | release date | domestic box office | worldwide box office | personal rating | international box office | |
---|---|---|---|---|---|---|
2 | Iron Man | 2008-05-02 | 318604126 | 585171547 | 7 | 266567421 |
4 | Breakfast at Tiffany's | 1961-10-05 | 9551904 | 9794721 | 7 | 242817 |
Selecting specific rows based on criteria, like the minimum or maximum values, is part of the larger idea of "Row Selection" with DataFrames. View additional explanations, videos, and example problems covering row selection as part of the DISCOVERY course content found here:
Finding the Smallest and Largest Values of Data in a Column
.max()
and .min()
functions allow us to find the smallest and largest numbers in a column. Make sure to specify the column in brackets before applying the function.
Note: this only works for columns of integer or float dtypes — not strings.