Sorting a DataFrame Using Pandas


The sort_values method of a DataFrame is used to sort a DataFrame by the data in a column.

To explore sorting, we'll use a DataFrame of five cities, across three states, with various populations:

import pandas as pd

# Creates a DataFrame of "city", "state", and "population" columns:
df = pd.DataFrame( [{"city": "Los Angeles", "state": "CA", "population": 3898747},
                    {"city": "Urbana", "state": "IL", "population": 42461},
                    {"city": "New York", "state": "NY", "population": 8804190},
                    {"city": "Washington D.C.", "population": 689545},  # Note: No "state" for D.C.
                    {"city": "Chicago", "state": "IL", "population": 2746388}] )
citystatepopulation
0Los AngelesCA3898747
1UrbanaIL42461
2New YorkNY8804190
3Washington D.C.NaN689545
4ChicagoIL2746388
Creating a DataFrame to use for sorting

Sorting by a Single Column

Usually, we need to sort data by the values contained within a single column. We can provide the column name and pandas will return a new DataFrame:

df.sort_values("population")
citystatepopulation
1UrbanaIL42461
3Washington D.C.NaN689545
4ChicagoIL2746388
0Los AngelesCA3898747
2New YorkNY8804190
Sorting a DataFrame by a Single Column

Sorting by Multiple Columns

A list of column names can be provided to sort by multiple columns. Your DataFrame will be sorted by the first column first and, when multiple rows have the same value for the first column, those rows are then sorted by their second column.

df.sort_values(["state", "population"])
#    => Sorts by "state" first,
#       then "population" for rows with identical "state" values.
citystatepopulation
0Los AngelesCA3898747
1UrbanaIL42461
4ChicagoIL2746388
2New YorkNY8804190
3Washington D.C.NaN689545
Sorting a DataFrame by a Multiple Column

Sorting in Reverse Order (Descending Order)

By default, the ascending parameter is True indicating that the sorting will be done in ascending order (smallest value first, largest value last). The ascending parameter can be set to False to sort in descending order.

df.sort_values("population", ascending=False)
#    => Sorts in reverse (descending) order
citystatepopulation
2New YorkNY8804190
0Los AngelesCA3898747
4ChicagoIL2746388
3Washington D.C.NaN689545
1UrbanaIL42461
Sorting in Reverse Order (Descending Order)

Using a Stable Sort

When a sort is a stable, the sort guarantees to not change the order among rows with identical values. For example, our initial dataset has "Urbana" listed before "Chicago". A stable sort that sorts the list only on state will keep "Urbana" listed before "Chicago" since that is the way it appears in the original list.

By default, the sorting will be done as fast as possible using a quicksort algorithm. The quicksort algorithm is NOT stable. The kind parameter can specify a "stable" to guarantee a stable sort is used.

df.sort_values("state", kind="stable")
#    => Guarantees a "stable" sort
citystatepopulation
0Los AngelesCA3898747
1UrbanaIL42461
4ChicagoIL2746388
2New YorkNY8804190
3Washington D.C.NaN689545
Sorting using a stable sort, providing a guarantee that the relative order of two elements with the same sorted value will not change.

Location of NaN Values

When sorting a DataFrame, missing values (NaN) can be placed at the beginning or end of your DataFrame. By default, the na_position parameter is set to "last". The na_position can be set to "first" to place missing values at the beginning of your DataFrame.

df.sort_values("state", na_position="first")
#    => Sorts the missing values (NaN) to the first part of the DataFrame
citystatepopulation
3Washington D.C.NaN689545
0Los AngelesCA3898747
1UrbanaIL42461
4ChicagoIL2746388
2New YorkNY8804190
Sorting with missing (NaN) values appearing at the beginning of the DataFrame.

Pandas Documentation

The full documentation for sort_values is available in the pandas documentation.