Selecting Rows that are IN and NOT IN a DataFrame


The .isin function is commonly used to check if values in one DataFrame are in another DataFrame. It can also be used to select rows from a DataFrame depending on if they are in another DataFrame.

To explore this function, we'll use a DataFrame of 10 songs released in 2022:

import pandas as pd

df1 = pd.DataFrame([
    {'title': 'As It Was', 'artist': 'Harry Styles', 'album': "Harry's House", 'month': 4 },
    {'title': 'Bad Habit', 'artist': 'Steve Lacy', 'album': 'Gemini Rights', 'month': 6 },
    {'title': 'Constant Repeat', 'artist': 'Charli XCX', 'album': "CRASH", 'month': 3 }
    {'title': 'Pink Funeral', 'artist': 'Beach House', 'album': "Once Twice Melody", 'month': 2 },
    {'title': 'BREAK MY SOUL', 'artist': 'Beyonce', 'album': "RENAISSANCE", 'month': 7 },
    {'title': 'Sidelines', 'artist': 'Phoebe Bridgers', 'album': "Sidelines", 'month': 4 },
    {'title': 'Glimpse of Us', 'artist': 'Joji', 'album': "Glimpse of Us", 'month': 6 },
    {'title': 'Pink Venom', 'artist': 'BLACKPINK', 'album': "Pink Venom", 'month': 8 },
    {'title': 'Vegas', 'artist': 'Doja Cat', 'album': "ELVIS (Original Motion Picture Soundtrack)", 'month': 6 },
    {'title': 'First Class', 'artist': 'Jack Harlow', 'album': "Come Home The Kids Miss You", 'month': 5 },
])

We create two DataFrames by choosing 5 random songs from our main DataFrame df1:

# Create two new DataFrames containing 5 songs from our first DataFrame:
sample1 = df1.sample(5)
titleartistalbummonth
8VegasDoja CatELVIS (Original Motion Picture Soundtrack)6
4BREAK MY SOULBeyonceRENAISSANCE7
0As It WasHarry StylesHarry's House4
7Pink VenomBLACKPINKPink Venom8
6Glimpse of UsJojiGlimpse of Us6
sample2 = df1.sample(5)
titleartistalbummonth
6Glimpse of UsJojiGlimpse of Us6
9First ClassJack HarlowCome Home The Kids Miss You5
0As It WasHarry StylesHarry\'s House4
8VegasDoja CatELVIS (Original Motion Picture Soundtrack)6
1Bad HabitSteve LacyGemini Rights6
Creating two sample DataFrames from our original DataFrame df1.

Select all rows from a DataFrame IN in another DataFrame

Since we know both samples came from the same original DataFrame with unique index values, we can use the .index.isin function to find which songs are in a sample. This function will select the row ONLY when the row is in the target DataFrame contained inside of the function.

For example, we can select all of the rows in df1 that are in sample1:

# Selects all rows from DataFrame `df1` that are in DataFrame `sample1`:
df1.index.isin( sample1.index )
titleartistalbummonth
8VegasDoja CatELVIS (Original Motion Picture Soundtrack)6
4BREAK MY SOULBeyonceRENAISSANCE7
0As It WasHarry StylesHarry's House4
7Pink VenomBLACKPINKPink Venom8
6Glimpse of UsJojiGlimpse of Us6
Using .index.isin to select all of rows from df1 that is in sample1

Selecting all rows contained in MULTIPLE DataFrames

You may have noticed that some of our songs appear in both of our sample DataFrames. By using & (AND) conditional, we use .index.isin twice to select only the rows contained in both of two different DataFrames.

Using our two sample DataFrames, we find the rows in df that are contained in both sample1 and sample2:

df1[ df1.index.isin(sample1.index) & df1.index.isin(sample2.index) ]
titleartistalbummonth
6Glimpse of UsJojiGlimpse of Us6
0As It WasHarry StylesHarry\'s House4
8VegasDoja CatELVIS (Original Motion Picture Soundtrack)6
Using .index.isin to find values in another DataFrame

Select all rows from a DataFrame NOT in another DataFrame

Using the .index.isin function, we can also select all rows that are NOT in another DataFrame. Using pandas, the ~ operator denotes NOT. Therefore, to find all rows NOT in another DataFrame, we modify the .index.isin function with the ~ operator.

The following code finds all songs from df1 NOT contained sample1:

df1[ ~df1.index.isin(sample1.index) ]
titleartistalbummonth
1Bad HabitSteve LacyGemini Rights6
2Constant RepeatCharli XCXCRASH3
3Pink FuneralBeach HouseOnce Twice Melody2
5SidelinesPhoebe BridgersSidelines4
9First ClassJack HarlowCome Home The Kids Miss You5
Select all rows from a DataFrame NOT in another DataFrame

Pandas Documentation

Click Here for the full pandas documentation for the isin function