Selecting DataFrame Rows Based on String Contents


When working with text, it is often useful to select rows that contain a specific string. The .str.contains(...) function allows us to test each row's data to determine if a specific string exists in the text.

To explore this function, we'll use a DataFrame of the five tallest mountains in the world:

Reset Code Python Output:
(Run your code to see your code result's here.)

Select All Rows Containing a String

In the DataFrame, we can see four of the five tallest mountains are in Nepal. If we use a == comparison, our row selection only selects two of the four mountains in Nepal since == asks Python to find rows "EXACTLY equals to" a value.

Reset Code Python Output:
mountain feet location
3 Lhotse 27940 Nepal
4 Makalu 27838 Nepal

Instead, .str.contains(...) allows us check if the string contains a specific string anywhere within the string. Looking for the locations that contains Nepal, we find four mountains:

Reset Code Python Output:
mountain feet location
0 Mount Everest 29029 Nepal/China
2 Kangchenjunga 28169 Nepal/India
3 Lhotse 27940 Nepal
4 Makalu 27838 Nepal

Select All Rows Containing Two Strings

The .str.contains(...) operation can be combined with & to test for the presence of two strings within one field. For example, we can test for all mountains that are located in BOTH Nepal and China:

Reset Code Python Output:
mountain feet location
0 Mount Everest 29029 Nepal/China

Pandas Documentation

pandas.Series.str.contains contains the full pandas documentation for the str.contains function.