Sorting a DataFrame Using Pandas

The sort_values method of a DataFrame is used to sort a DataFrame by the data in a column.

To explore sorting, we'll use a DataFrame of five cities, across three states, with various populations:

import pandas as pd\n&nbsp;\n# Creates a DataFrame of "city", "state", and "population" columns:\ndf = pd.DataFrame( [{"city": "Los Angeles", "state": "CA", "population": 3898747},\n                    {"city": "Urbana", "state": "IL", "population": 42461},\n                    {"city": "New York", "state": "NY", "population": 8804190},\n                    {"city": "Washington D.C.", "population": 689545},  # Note: No "state" for D.C.\n                    {"city": "Chicago", "state": "IL", "population": 2746388}] )\n&nbsp;\ndf

Reset Code Python Output:


  
    
      
      city
      state
      population
    
  
  
    
      0
      Los Angeles
      CA
      3898747
    
    
      1
      Urbana
      IL
      42461
    
    
      2
      New York
      NY
      8804190
    
    
      3
      Washington D.C.
      NaN
      689545
    
    
      4
      Chicago
      IL
      2746388

	city	state	population
0	Los Angeles	CA	3898747
1	Urbana	IL	42461
2	New York	NY	8804190
3	Washington D.C.	NaN	689545
4	Chicago	IL	2746388

Sorting by a Single Column

Usually, we need to sort data by the values contained within a single column. We can provide the column name and pandas will return a new DataFrame:

import pandas as pd\n&nbsp;\ndf = pd.DataFrame( [{"city": "Los Angeles", "state": "CA", "population": 3898747},\n                    {"city": "Urbana", "state": "IL", "population": 42461},\n                    {"city": "New York", "state": "NY", "population": 8804190},\n                    {"city": "Washington D.C.", "population": 689545},\n                    {"city": "Chicago", "state": "IL", "population": 2746388}] )\ndf.sort_values("population")

Reset Code Python Output:


  
    
      
      city
      state
      population
    
  
  
    
      1
      Urbana
      IL
      42461
    
    
      3
      Washington D.C.
      NaN
      689545
    
    
      4
      Chicago
      IL
      2746388
    
    
      0
      Los Angeles
      CA
      3898747
    
    
      2
      New York
      NY
      8804190

	city	state	population
1	Urbana	IL	42461
3	Washington D.C.	NaN	689545
4	Chicago	IL	2746388
0	Los Angeles	CA	3898747
2	New York	NY	8804190

Sorting by Multiple Columns

A list of column names can be provided to sort by multiple columns. Your DataFrame will be sorted by the first column first and, when multiple rows have the same value for the first column, those rows are then sorted by their second column.

import pandas as pd\n&nbsp;\ndf = pd.DataFrame( [{"city": "Los Angeles", "state": "CA", "population": 3898747},\n                    {"city": "Urbana", "state": "IL", "population": 42461},\n                    {"city": "New York", "state": "NY", "population": 8804190},\n                    {"city": "Washington D.C.", "population": 689545},\n                    {"city": "Chicago", "state": "IL", "population": 2746388}] )\ndf.sort_values(["state", "population"])\n#    => Sorts by "state" first,\n#       then "population" for rows with identical "state" values.

Reset Code Python Output:


  
    
      
      city
      state
      population
    
  
  
    
      0
      Los Angeles
      CA
      3898747
    
    
      1
      Urbana
      IL
      42461
    
    
      4
      Chicago
      IL
      2746388
    
    
      2
      New York
      NY
      8804190
    
    
      3
      Washington D.C.
      NaN
      689545

	city	state	population
0	Los Angeles	CA	3898747
1	Urbana	IL	42461
4	Chicago	IL	2746388
2	New York	NY	8804190
3	Washington D.C.	NaN	689545

Sorting in Reverse Order (Descending Order)

By default, the ascending parameter is True indicating that the sorting will be done in ascending order (smallest value first, largest value last). The ascending parameter can be set to False to sort in descending order.

import pandas as pd\n&nbsp;\ndf = pd.DataFrame( [{"city": "Los Angeles", "state": "CA", "population": 3898747},\n                    {"city": "Urbana", "state": "IL", "population": 42461},\n                    {"city": "New York", "state": "NY", "population": 8804190},\n                    {"city": "Washington D.C.", "population": 689545},\n                    {"city": "Chicago", "state": "IL", "population": 2746388}] )\ndf.sort_values("population", ascending=False)\n#    => Sorts in reverse (descending) order

Reset Code Python Output:


  
    
      
      city
      state
      population
    
  
  
    
      2
      New York
      NY
      8804190
    
    
      0
      Los Angeles
      CA
      3898747
    
    
      4
      Chicago
      IL
      2746388
    
    
      3
      Washington D.C.
      NaN
      689545
    
    
      1
      Urbana
      IL
      42461

	city	state	population
2	New York	NY	8804190
0	Los Angeles	CA	3898747
4	Chicago	IL	2746388
3	Washington D.C.	NaN	689545
1	Urbana	IL	42461

Using a Stable Sort

When a sort is a stable, the sort guarantees to not change the order among rows with identical values. For example, our initial dataset has "Urbana" listed before "Chicago". A stable sort that sorts the list only on state will keep "Urbana" listed before "Chicago" since that is the way it appears in the original list.

By default, the sorting will be done as fast as possible using a quicksort algorithm. The quicksort algorithm is NOT stable. The kind parameter can specify a "stable" to guarantee a stable sort is used.

import pandas as pd\n&nbsp;\ndf = pd.DataFrame( [{"city": "Los Angeles", "state": "CA", "population": 3898747},\n                    {"city": "Urbana", "state": "IL", "population": 42461},\n                    {"city": "New York", "state": "NY", "population": 8804190},\n                    {"city": "Washington D.C.", "population": 689545},\n                    {"city": "Chicago", "state": "IL", "population": 2746388}] )\ndf.sort_values("state", kind="stable")\n#    => Guarantees a "stable" sort

Reset Code Python Output:


  
    
      
      city
      state
      population
    
  
  
    
      0
      Los Angeles
      CA
      3898747
    
    
      1
      Urbana
      IL
      42461
    
    
      4
      Chicago
      IL
      2746388
    
    
      2
      New York
      NY
      8804190
    
    
      3
      Washington D.C.
      NaN
      689545

	city	state	population
0	Los Angeles	CA	3898747
1	Urbana	IL	42461
4	Chicago	IL	2746388
2	New York	NY	8804190
3	Washington D.C.	NaN	689545

Location of `NaN` Values

When sorting a DataFrame, missing values (NaN) can be placed at the beginning or end of your DataFrame. By default, the na_position parameter is set to "last". The na_position can be set to "first" to place missing values at the beginning of your DataFrame.

import pandas as pd\n&nbsp;\ndf = pd.DataFrame( [{"city": "Los Angeles", "state": "CA", "population": 3898747},\n                    {"city": "Urbana", "state": "IL", "population": 42461},\n                    {"city": "New York", "state": "NY", "population": 8804190},\n                    {"city": "Washington D.C.", "population": 689545},\n                    {"city": "Chicago", "state": "IL", "population": 2746388}] )\ndf.sort_values("state", na_position="first")\n#    => Sorts the missing values (NaN) to the first part of the DataFrame

Reset Code Python Output:


  
    
      
      city
      state
      population
    
  
  
    
      3
      Washington D.C.
      NaN
      689545
    
    
      0
      Los Angeles
      CA
      3898747
    
    
      1
      Urbana
      IL
      42461
    
    
      4
      Chicago
      IL
      2746388
    
    
      2
      New York
      NY
      8804190

	city	state	population
3	Washington D.C.	NaN	689545
0	Los Angeles	CA	3898747
1	Urbana	IL	42461
4	Chicago	IL	2746388
2	New York	NY	8804190

Pandas Documentation

The full documentation for sort_values is available in the pandas documentation.

Sorting a DataFrame Using Pandas

Sorting by a Single Column

Sorting by Multiple Columns

Sorting in Reverse Order (Descending Order)

Using a Stable Sort

Location of NaN Values

Pandas Documentation

Location of `NaN` Values