Comparing Two DataFrames


Oftentimes when you have two DataFrames of similar data you may want to see see where the differences lie between them. DataFrames provides this functionality in a function called compare.

Let's say we have these two (very similar) DataFrames:

df1 = pd.DataFrame([
    {'Name': 'Saul', 'Favorite Color': 'Maroon', 'Show': 'BCS'},
    {'Name': 'Walter', 'Favorite Color': 'Blue', 'Show': 'BB'},
    {'Name': 'Kim', 'Favorite Color': 'Red', 'Show': 'BCS'},
    {'Name': 'Howard', 'Favorite Color': 'Green', 'Show': 'BCS'}
])
df2 = pd.DataFrame([
    {'Name': 'Saul', 'Favorite Color': 'Maroon', 'Show': 'BCS'},
    {'Name': 'Walter', 'Favorite Color': 'Blue', 'Show': 'BB'},
    {'Name': 'Kim', 'Favorite Color': 'Red', 'Show': 'BCS'},
    {'Name': 'Jesse', 'Favorite Color': 'Maroon', 'Show': 'BB'},
])
Our example DataFrames

Using Compare

If we want to find the different rows, we can simply run this command to compare the two DataFrames

df1.compare(df2)
NameFavorite ColorShow
selfotherselfotherselfother
3HowardJesseGreenMaroonBCSBB
Comparing two DataFrames

By default this will find the different rows. If you want this by column, we can set the parameter of align_axis to 0 (for column-wise operations)

df1.compare(df2, align_axis=0)
NameFavorite ColorShow
3selfHowardGreenBCS
otherJesseMaroonBB
Comparing two DataFrames by columns

Other Parameters

By default, compare is configured to only show you the differences between the two differences, but we can see more by specifying some additional parameters in the function call

If you specify the keep_shape parameter to True, we can see everything in the DataFrame, with NaNs populated for matches and values present for differences

df1.compare(df2, keep_shape=True)
NameFavorite ColorShow
selfotherselfotherselfother
0NaNNaNNaNNaNNaNNaN
1NaNNaNNaNNaNNaNNaN
2NaNNaNNaNNaNNaNNaN
3HowardJesseGreenMaroonBCSBB
Comparing two DataFrames with keep_shape

We can also add the keep_equal parameter to populate matches as well, which may be useful for visualization or when you want to combine DataFrames using sub-columns

df1.compare(df2, keep_shape=True, keep_equal=True)
NameFavorite ColorShow
selfotherselfotherselfother
0SaulSaulMaroonMaroonBCSBCS
1WalterWalterBlueBlueBBBB
2KimKimRedRedBCSBCS
3HowardJesseGreenMaroonBCSBB
Comparing two DataFrames with keep_equal

You can read more about compare in the Pandas documentation here: pandas documentation