Finding Quantiles of a Column in a DataFrame


We can find many different quantiles for sets of numbers using the .quantile() function of a DataFrame. One specific quantiles, the 50% quantile, is almost universally known since it is the median!

If the numbers in a column are organized in ascending order, the median is the value that rests directly in the middle of the data, with 50% on the left side (and the right side, but we focus specifically on the left side when we think of quantiles). We can also find the 25% quantile, which is the value with 25% of the data to the left, and the 75% quantile, which is the value with 75% of the data to the left.

The Movie Dataset

Let's use a small DataFrame with information about movies to see this function in action!

import pandas as pd

# Creates a DataFrame of "movie", "release date", "domestic gross", "worldwide gross", "personal rating", and "international box office" columns
df = pd.DataFrame([
  {"movie": "The Truman Show", "release date": "1996-06-05", "domestic box office": 125618201, "worldwide box office": 264118201, "personal rating": 10, "international box office": 138500000}, 
  {"movie": "Rogue One: A Star Wars Story", "release date": "2016-12-16", "domestic box office": 532177324, "worldwide box office": 1055135598, "personal rating": 9, "international box office": 522958274}, 
  {"movie": "Iron Man", "release date": "2008-05-02", "domestic box office": 318604126, "worldwide box office": 585171547, "personal rating": 7, "international box office": 266567421}, 
  {"movie": "Blade Runner", "release date": "1982-06-25", "domestic box office": 32656328, "worldwide box office": 39535837, "personal rating": 8, "international box office": 6879509}, 
  {"movie": "Breakfast at Tiffany's", "release date": "1961-10-05", "domestic box office": 9551904, "worldwide box office": 9794721, "personal rating": 7, "international box office": 242817}
])
df
movierelease datedomestic box officeworldwide box officepersonal ratinginternational box office
0The Truman Show1996-06-0512561820126411820110138500000
1Rogue One: A Star Wars Story2016-12-1653217732410551355989522958274
2Iron Man2008-05-023186041265851715477266567421
3Blade Runner1982-06-25326563283953583786879509
4Breakfast at Tiffany's1961-10-05955190497947217242817
Creating the movie DataFrame

Choosing the Quantile

The usefulness of .quantile() function lies with its parameter. By default, the function calculates the 50% quantile (the median). This is kind of redundant, though, because we already have a .median() function that returns the same result.

# Just as with any other descriptive statistic, specify the column in brackets.
fifty_quant = df["personal rating"].quantile()
print(fifty_quant)

median = df["personal rating"].median()
print(median)
8.0
8.0
Demonstrating default .quantile() function

We can change which quantile the function calculates by inputting our own decimal parameter. For example, to calculate the 25th percentile, type 0.25 in the parenthesis.

However, we are not limited to 0.25, 0.5, and 0.75. We can input any number between 0 and 1 to calculate more complicated quantiles.

df["personal rating"].quantile(0.25)
7
Changing the .quantile() parameter

Additional explanations, videos, and example problems covering quantiles is part of the DISCOVERY course content found here:

Finding Multiple Quantiles

We can also input a list of decimals to get every quantile we want at once. The result will be in list format.

list_quant = df["domestic box office"].quantile([0.25, 0.5, 0.75])
list_quant 
0.25     32656328.0
0.50    125618201.0
0.75    318604126.0
Name: domestic box office, dtype: float64
Calculating multiple quantiles at once