Combining DataFrames by Concatenation


Sometimes when you're dealing with multiple DataFrames, you might want to combine them together. One method of combining DataFrames is concatenation, which will combine the DataFrames end-to-end.

Concatenation is a great way to combine DataFrames with identical columns (for example, you may have data that was observed on two separate dates and stored in separate files that needs to be joined together). Concatenation does not look at the contents of the data at all and only joins the DataFrame end-to-end.

Instead, you want to merge data based on the contents (for example, when you have two files both with information about the same person and their name appears in both files), you need to merge DataFrames together instead of concatenating.

Consider the following DataFrames that are ideal for concatenation:

df1 = pd.DataFrame([
  {'Name': 'Saul', 'Job': 'Lawyer', 'Age': 60},
  {'Name': 'Walter', 'Job': 'Teacher', 'Age': 35},
  {'Name': 'Kim', 'Job': 'Lawyer', 'Age': 42},
])

df2 = pd.DataFrame([
  {'Name': 'Gus', 'Job': 'Manager', 'Age': 45},
  {'Name': 'Hank', 'Job': 'Special Agent', 'Age': 39},
  {'Name': 'Jesse', 'Job': 'Distributor', 'Age': 20}
])

Concatenating DataFrames Vertically

The pd.concat() function combines a list of DataFrames end-to-end. To combine df1 to the end of df2:

pd.concat([df1, df2])
NameJobAge
0SaulLawyer60
1WalterTeacher35
2KimLawyer42
0GusManager45
1HankSpecial Agent39
2JesseDistributor20
Concatenate two DataFrames Together

You can add as many elements in the list, and place them in the order you'd like them to be combined in the resulting DataFrame.

Re-generating Index Row Labels After Concatenating

The concatenation of two DataFrames adds the rows end-to-end, leaving the row labels unchanged *(notice the repeating 0, 1, 2 labels). The .reset_index() function reset the row labels to index values to match the row number.

pd.concat([df1, df2]).reset_index()
NameJobAge
0SaulLawyer60
1WalterTeacher35
2KimLawyer42
3GusManager45
4HankSpecial Agent39
5JesseDistributor20
Concatenate two DataFrames together AND reset the index

Concatenating Non-Identical Columns

Since concatenating two DataFrames does not do any merging of data, when your DataFrames do not have identical column names the missing columns in any DataFrames that are being concatenated are marked with an NaN. To combine data based on merging data, use a merging technique instead of concatenation.

# The "Name" column appears in df1, but not "Favorite Color" and "Show" does not appear in df1
df3 = pd.DataFrame([
    {'Name': 'Kim', 'Favorite Color': 'Red', 'Show': 'BCS'},
    {'Name': 'Walter', 'Favorite Color': 'Blue', 'Show': 'BB'},
    {'Name': 'Jesse', 'Favorite Color': 'Green', 'Show': 'BB'},
    {'Name': 'Saul', 'Favorite Color': 'Maroon', 'Show': 'BCS'},
])

# Combine `df1` and our new `df3`, and using .reset_index():
pd.concat([df1, df3]).reset_index()
indexNameJobAgeFavorite ColorShow
00SaulLawyer60.0NaNNaN
11WalterTeacher35.0NaNNaN
22KimLawyer42.0NaNNaN
30KimNaNNaNRedBCS
41WalterNaNNaNBlueBB
52JesseNaNNaNGreenBB
63SaulNaNNaNMaroonBCS
Concatenating two DataFrames with some non-identical columns

Concatenating DataFrame Horizontally

If you wanted to combine the two DataFrames horizontally, you can use .concat() with the parameter axis=1. This might be useful if data extends across multiple columns in the two DataFrames.

You can think of this as extending the columns of the first DataFrame, as opposed to extending the rows.

df4 = pd.DataFrame([
    {'Name': 'Saul', 'Job': 'Lawyer', 'Age': 60},
    {'Name': 'Walter', 'Job': 'Teacher', 'Age': 35},
    {'Name': 'Kim', 'Job': 'Lawyer', 'Age': 42},
])
df5 = pd.DataFrame([
    {'Favorite Color': 'Maroon', 'Show': 'BCS'}, # Saul's continued record
    {'Favorite Color': 'Blue', 'Show': 'BB'},    # Walter's continued record
    {'Favorite Color': 'Red', 'Show': 'BCS'},    # Kim's continued record
])

pd.concat([df4, df5], axis=1)
NameJobAgeFavorite ColorShow
0SaulLawyer60MaroonBCS
1WalterTeacher35BlueBB
2KimLawyer42RedBCS
Concatenate two DataFrames Horizontally

Identical to concatenating the data vertically, the columns are added to row end-to-end. Be careful, as this can be dangerous if you are not absolutely certain your data is in the identical order. If you have a identifier column (for example, the Name columns) in both DataFrames, you should merge the DataFrames based on the contents of the column instead of concatenating.