Finding Minimum and Maximum Values in a DataFrame Column


It's often helpful to know a few specific values for each column (aka variable) in a DataFrame — mainly the highest value, lowest value, and all unique values. But sometimes, our DataFrame has so many rows, it's impossible to view the whole thing at once. Luckily, there are some pandas functions that allow us to easily locate this information.

The Movie Dataset

To demonstrate these functions, we'll use a DataFrame of five different movies, including information about their release date, how much money they made in US dollars, and a personal rating out of 10.

import pandas as pd

#Creates a DataFrame of "movie", "release date", "domestic gross", "worldwide gross", "personal rating", and "international box office" columns
df = pd.DataFrame([
  {"movie": "The Truman Show", "release date": "1996-06-05", "domestic box office": 125618201, "worldwide box office": 264118201, "personal rating": 10, "international box office": 138500000}, 
  {"movie": "Rogue One: A Star Wars Story", "release date": "2016-12-16", "domestic box office": 532177324, "worldwide box office": 1055135598, "personal rating": 9, "international box office": 522958274}, 
  {"movie": "Iron Man", "release date": "2008-05-02", "domestic box office": 318604126, "worldwide box office": 585171547, "personal rating": 7, "international box office": 266567421}, 
  {"movie": "Blade Runner", "release date": "1982-06-25", "domestic box office": 32656328, "worldwide box office": 39535837, "personal rating": 8, "international box office": 6879509}, 
  {"movie": "Breakfast at Tiffany's", "release date": "1961-10-05", "domestic box office": 9551904, "worldwide box office": 9794721, "personal rating": 7, "international box office": 242817}
  ])
df
movierelease datedomestic box officeworldwide box officepersonal ratinginternational box office
0The Truman Show1996-06-0512561820126411820110138500000
1Rogue One: A Star Wars Story2016-12-1653217732410551355989522958274
2Iron Man2008-05-023186041265851715477266567421
3Blade Runner1982-06-25326563283953583786879509
4Breakfast at Tiffany's1961-10-05955190497947217242817
Creating the movie DataFrame

Finding the Rows with the Largest or Smallest Value

Often, you will be interested in the rows with the smallest or largest values for a specific column. The nlargest and nsmallest functions allow us to find the rows that have the largest or smallest values for a specific columns.

For example, we may want the three movies with the largest domestic box office sales:

df.nlargest(3, 'domestic box office')
movierelease datedomestic box officeworldwide box officepersonal ratinginternational box office
1Rogue One: A Star Wars Story2016-12-1653217732410551355989522958274
2Iron Man2008-05-023186041265851715477266567421
0The Truman Show1996-06-0512561820126411820110138500000
Finding the three movies with the largest domestic box office sales

Another option is to see the two movies that the 'personal rating' is the lowest.

df.nsmallest(2, 'personal rating')
movierelease datedomestic box officeworldwide box officepersonal ratinginternational box office
2Iron Man2008-05-023186041265851715477266567421
4Breakfast at Tiffany's1961-10-05955190497947217242817
Finding the two movies with the smallest personal rating

Selecting specific rows based on criteria, like the minimum or maximum values, is part of the larger idea of "Row Selection" with DataFrames. View additional explanations, videos, and example problems covering row selection as part of the DISCOVERY course content found here:

Finding the Smallest and Largest Values of Data in a Column

.max() and .min() functions allow us to find the smallest and largest numbers in a column. Make sure to specify the column in brackets before applying the function.
Note: this only works for columns of integer or float dtypes — not strings.

df["worldwide box office"].min()
9794721
Finding the smallest worldwide box office performances
df["worldwide box office"].max()
1055135598
Finding the largest worldwide box office performances