When working with text, it is often useful to select rows that contain a specific string. The .str.contains(...) function allows us to test each row's data to determine if a specific string exists in the text.
To explore this function, we'll use a DataFrame of the five tallest mountains in the world:
In the DataFrame, we can see four of the five tallest mountains are in Nepal. If we use a == comparison, our row selection only selects two of the four mountains in Nepal since == asks Python to find rows "EXACTLY equals to" a value.
Instead, .str.contains(...) allows us check if the string contains a specific string anywhere within the string. Looking for the locations that contains Nepal, we find four mountains:
The .str.contains(...) operation can be combined with & to test for the presence of two strings within one field. For example, we can test for all mountains that are located in BOTH Nepal and China: