Selecting Rows that are IN and NOT IN a DataFrame


The .isin function is commonly used to check if values in one DataFrame are in another DataFrame. It can also be used to select rows from a DataFrame depending on if they are in another DataFrame.

To explore this function, we'll use a DataFrame of 10 songs released in 2022:

Reset Code Python Output:
(Run your code to see your code result's here.)

We create two DataFrames by choosing 5 random songs from our main DataFrame df1:

Reset Code Python Output:
title artist album month
8 Vegas Doja Cat ELVIS (Original Motion Picture Soundtrack) 6
4 BREAK MY SOUL Beyonce RENAISSANCE 7
0 As It Was Harry Styles Harry's House 4
7 Pink Venom BLACKPINK Pink Venom 8
6 Glimpse of Us Joji Glimpse of Us 6
Reset Code Python Output:
title artist album month
6 Glimpse of Us Joji Glimpse of Us 6
9 First Class Jack Harlow Come Home The Kids Miss You 5
0 As It Was Harry Styles Harry\'s House 4
8 Vegas Doja Cat ELVIS (Original Motion Picture Soundtrack) 6
1 Bad Habit Steve Lacy Gemini Rights 6

Select all rows from a DataFrame IN in another DataFrame

Since we know both samples came from the same original DataFrame with unique index values, we can use the .index.isin function to find which songs are in a sample. This function will select the row ONLY when the row is in the target DataFrame contained inside of the function.

For example, we can select all of the rows in df1 that are in sample1:

Reset Code Python Output:
title artist album month
8 Vegas Doja Cat ELVIS (Original Motion Picture Soundtrack) 6
4 BREAK MY SOUL Beyonce RENAISSANCE 7
0 As It Was Harry Styles Harry's House 4
7 Pink Venom BLACKPINK Pink Venom 8
6 Glimpse of Us Joji Glimpse of Us 6

Selecting all rows contained in MULTIPLE DataFrames

You may have noticed that some of our songs appear in both of our sample DataFrames. By using & (AND) conditional, we use .index.isin twice to select only the rows contained in both of two different DataFrames.

Using our two sample DataFrames, we find the rows in df that are contained in both sample1 and sample2:

Reset Code Python Output:
title artist album month
6 Glimpse of Us Joji Glimpse of Us 6
0 As It Was Harry Styles Harry\'s House 4
8 Vegas Doja Cat ELVIS (Original Motion Picture Soundtrack) 6

Select all rows from a DataFrame NOT in another DataFrame

Using the .index.isin function, we can also select all rows that are NOT in another DataFrame. Using pandas, the ~ operator denotes NOT. Therefore, to find all rows NOT in another DataFrame, we modify the .index.isin function with the ~ operator.

The following code finds all songs from df1 NOT contained sample1:

Reset Code Python Output:
title artist album month
1 Bad Habit Steve Lacy Gemini Rights 6
2 Constant Repeat Charli XCX CRASH 3
3 Pink Funeral Beach House Once Twice Melody 2
5 Sidelines Phoebe Bridgers Sidelines 4
9 First Class Jack Harlow Come Home The Kids Miss You 5

Pandas Documentation

Click Here for the full pandas documentation for the isin function