Selecting DataFrame Rows Based on String Contents

When working with text, it is often useful to select rows that contain a specific string. The .str.contains(...) function allows us to test each row's data to determine if a specific string exists in the text.

To explore this function, we'll use a DataFrame of the five tallest mountains in the world:

import pandas as pd\n&nbsp;\ndf = pd.DataFrame([\n  {"mountain": "Mount Everest", "feet": 29029, "location": "Nepal/China"},\n  {"mountain": "K2", "feet": 28255, "location": "Pakistan/China"},\n  {"mountain": "Kangchenjunga", "feet": 28169, "location": "Nepal/India"},\n  {"mountain": "Lhotse", "feet": 27940, "location": "Nepal"},\n  {"mountain": "Makalu", "feet": 27838, "location": "Nepal"},\n])\ndf

Reset Code Python Output:

(Run your code to see your code result's here.)

Select All Rows Containing a String

In the DataFrame, we can see four of the five tallest mountains are in Nepal. If we use a == comparison, our row selection only selects two of the four mountains in Nepal since == asks Python to find rows "EXACTLY equals to" a value.

import pandas as pd\n&nbsp;\ndf = pd.DataFrame([\n  {"mountain": "Mount Everest", "feet": 29029, "location": "Nepal/China"},\n  {"mountain": "K2", "feet": 28255, "location": "Pakistan/China"},\n  {"mountain": "Kangchenjunga", "feet": 28169, "location": "Nepal/India"},\n  {"mountain": "Lhotse", "feet": 27940, "location": "Nepal"},\n  {"mountain": "Makalu", "feet": 27838, "location": "Nepal"},\n])\ndf[df.location == "Nepal"]

Reset Code Python Output:


  
    
      
      mountain
      feet
      location
    
  
  
    
      3
      Lhotse
      27940
      Nepal
    
    
      4
      Makalu
      27838
      Nepal

	mountain	feet	location
3	Lhotse	27940	Nepal
4	Makalu	27838	Nepal

Instead, .str.contains(...) allows us check if the string contains a specific string anywhere within the string. Looking for the locations that contains Nepal, we find four mountains:

import pandas as pd\n&nbsp;\ndf = pd.DataFrame([\n  {"mountain": "Mount Everest", "feet": 29029, "location": "Nepal/China"},\n  {"mountain": "K2", "feet": 28255, "location": "Pakistan/China"},\n  {"mountain": "Kangchenjunga", "feet": 28169, "location": "Nepal/India"},\n  {"mountain": "Lhotse", "feet": 27940, "location": "Nepal"},\n  {"mountain": "Makalu", "feet": 27838, "location": "Nepal"},\n])\ndf[df.location.str.contains("Nepal")]

Reset Code Python Output:


  
    
      
      mountain
      feet
      location
    
  
  
    
      0
      Mount Everest
      29029
      Nepal/China
    
    
      2
      Kangchenjunga
      28169
      Nepal/India
    
    
      3
      Lhotse
      27940
      Nepal
    
    
      4
      Makalu
      27838
      Nepal

	mountain	feet	location
0	Mount Everest	29029	Nepal/China
2	Kangchenjunga	28169	Nepal/India
3	Lhotse	27940	Nepal
4	Makalu	27838	Nepal

Select All Rows Containing Two Strings

The .str.contains(...) operation can be combined with & to test for the presence of two strings within one field. For example, we can test for all mountains that are located in BOTH Nepal and China:

import pandas as pd\n&nbsp;\ndf = pd.DataFrame([\n  {"mountain": "Mount Everest", "feet": 29029, "location": "Nepal/China"},\n  {"mountain": "K2", "feet": 28255, "location": "Pakistan/China"},\n  {"mountain": "Kangchenjunga", "feet": 28169, "location": "Nepal/India"},\n  {"mountain": "Lhotse", "feet": 27940, "location": "Nepal"},\n  {"mountain": "Makalu", "feet": 27838, "location": "Nepal"},\n])\ndf[df.location.str.contains("Nepal") & df.location.str.contains("China")]

Reset Code Python Output:


  
    
      
      mountain
      feet
      location
    
  
  
    
      0
      Mount Everest
      29029
      Nepal/China

	mountain	feet	location
0	Mount Everest	29029	Nepal/China

Pandas Documentation

pandas.Series.str.contains contains the full pandas documentation for the str.contains function.