Comparing Two DataFrames
Oftentimes when you have two DataFrames of similar data you may want to see see where the differences lie between them. DataFrames provides this functionality in a function called compare
.
Let's say we have these two (very similar) DataFrames:
Using Compare
If we want to find the different rows, we can simply run this command to compare the two DataFrames
df1.compare(df2)
Name | Favorite Color | Show | ||||
---|---|---|---|---|---|---|
self | other | self | other | self | other | |
3 | Howard | Jesse | Green | Maroon | BCS | BB |
By default this will find the different rows. If you want this by column, we can set the parameter of align_axis
to 0 (for column-wise operations)
df1.compare(df2, align_axis=0)
Name | Favorite Color | Show | ||
---|---|---|---|---|
3 | self | Howard | Green | BCS |
other | Jesse | Maroon | BB |
Other Parameters
By default, compare
is configured to only show you the differences between the two differences, but we can see more by specifying some additional parameters in the function call
If you specify the keep_shape
parameter to True, we can see everything in the DataFrame, with NaN
s populated for matches and values present for differences
df1.compare(df2, keep_shape=True)
Name | Favorite Color | Show | ||||
---|---|---|---|---|---|---|
self | other | self | other | self | other | |
0 | NaN | NaN | NaN | NaN | NaN | NaN |
1 | NaN | NaN | NaN | NaN | NaN | NaN |
2 | NaN | NaN | NaN | NaN | NaN | NaN |
3 | Howard | Jesse | Green | Maroon | BCS | BB |
We can also add the keep_equal
parameter to populate matches as well, which may be useful for visualization or when you want to combine DataFrames using sub-columns
df1.compare(df2, keep_shape=True, keep_equal=True)
Name | Favorite Color | Show | ||||
---|---|---|---|---|---|---|
self | other | self | other | self | other | |
0 | Saul | Saul | Maroon | Maroon | BCS | BCS |
1 | Walter | Walter | Blue | Blue | BB | BB |
2 | Kim | Kim | Red | Red | BCS | BCS |
3 | Howard | Jesse | Green | Maroon | BCS | BB |
You can read more about compare
in the Pandas documentation here: pandas documentation