Sorting a DataFrame Using Pandas


The sort_values method of a DataFrame is used to sort a DataFrame by the data in a column.

To explore sorting, we'll use a DataFrame of five cities, across three states, with various populations:

Reset Code Python Output:
city state population
0 Los Angeles CA 3898747
1 Urbana IL 42461
2 New York NY 8804190
3 Washington D.C. NaN 689545
4 Chicago IL 2746388

Sorting by a Single Column

Usually, we need to sort data by the values contained within a single column. We can provide the column name and pandas will return a new DataFrame:

Reset Code Python Output:
city state population
1 Urbana IL 42461
3 Washington D.C. NaN 689545
4 Chicago IL 2746388
0 Los Angeles CA 3898747
2 New York NY 8804190

Sorting by Multiple Columns

A list of column names can be provided to sort by multiple columns. Your DataFrame will be sorted by the first column first and, when multiple rows have the same value for the first column, those rows are then sorted by their second column.

Reset Code Python Output:
city state population
0 Los Angeles CA 3898747
1 Urbana IL 42461
4 Chicago IL 2746388
2 New York NY 8804190
3 Washington D.C. NaN 689545

Sorting in Reverse Order (Descending Order)

By default, the ascending parameter is True indicating that the sorting will be done in ascending order (smallest value first, largest value last). The ascending parameter can be set to False to sort in descending order.

Reset Code Python Output:
city state population
2 New York NY 8804190
0 Los Angeles CA 3898747
4 Chicago IL 2746388
3 Washington D.C. NaN 689545
1 Urbana IL 42461

Using a Stable Sort

When a sort is a stable, the sort guarantees to not change the order among rows with identical values. For example, our initial dataset has "Urbana" listed before "Chicago". A stable sort that sorts the list only on state will keep "Urbana" listed before "Chicago" since that is the way it appears in the original list.

By default, the sorting will be done as fast as possible using a quicksort algorithm. The quicksort algorithm is NOT stable. The kind parameter can specify a "stable" to guarantee a stable sort is used.

Reset Code Python Output:
city state population
0 Los Angeles CA 3898747
1 Urbana IL 42461
4 Chicago IL 2746388
2 New York NY 8804190
3 Washington D.C. NaN 689545

Location of NaN Values

When sorting a DataFrame, missing values (NaN) can be placed at the beginning or end of your DataFrame. By default, the na_position parameter is set to "last". The na_position can be set to "first" to place missing values at the beginning of your DataFrame.

Reset Code Python Output:
city state population
3 Washington D.C. NaN 689545
0 Los Angeles CA 3898747
1 Urbana IL 42461
4 Chicago IL 2746388
2 New York NY 8804190

Pandas Documentation

The full documentation for sort_values is available in the pandas documentation.