Combining DataFrames by Concatenation

Sometimes when you're dealing with multiple DataFrames, you might want to combine them together. One method of combining DataFrames is concatenation, which will combine the DataFrames end-to-end.

Concatenation is a great way to combine DataFrames with identical columns (for example, you may have data that was observed on two separate dates and stored in separate files that needs to be joined together). Concatenation does not look at the contents of the data at all and only joins the DataFrame end-to-end.

Instead, you want to merge data based on the contents (for example, when you have two files both with information about the same person and their name appears in both files), you need to merge DataFrames together instead of concatenating.

Consider the following DataFrames that are ideal for concatenation:

df1 = pd.DataFrame([
  {'Name': 'Saul', 'Job': 'Lawyer', 'Age': 60},
  {'Name': 'Walter', 'Job': 'Teacher', 'Age': 35},
  {'Name': 'Kim', 'Job': 'Lawyer', 'Age': 42},

df2 = pd.DataFrame([
  {'Name': 'Gus', 'Job': 'Manager', 'Age': 45},
  {'Name': 'Hank', 'Job': 'Special Agent', 'Age': 39},
  {'Name': 'Jesse', 'Job': 'Distributor', 'Age': 20}

Concatenating DataFrames Vertically

The pd.concat() function combines a list of DataFrames end-to-end. To combine df1 to the end of df2:

pd.concat([df1, df2])
1HankSpecial Agent39
Concatenate two DataFrames Together

You can add as many elements in the list, and place them in the order you'd like them to be combined in the resulting DataFrame.

Re-generating Index Row Labels After Concatenating

The concatenation of two DataFrames adds the rows end-to-end, leaving the row labels unchanged *(notice the repeating 0, 1, 2 labels). The .reset_index() function reset the row labels to index values to match the row number.

pd.concat([df1, df2]).reset_index()
4HankSpecial Agent39
Concatenate two DataFrames together AND reset the index

Concatenating Non-Identical Columns

Since concatenating two DataFrames does not do any merging of data, when your DataFrames do not have identical column names the missing columns in any DataFrames that are being concatenated are marked with an NaN. To combine data based on merging data, use a merging technique instead of concatenation.

# The "Name" column appears in df1, but not "Favorite Color" and "Show" does not appear in df1
df3 = pd.DataFrame([
    {'Name': 'Kim', 'Favorite Color': 'Red', 'Show': 'BCS'},
    {'Name': 'Walter', 'Favorite Color': 'Blue', 'Show': 'BB'},
    {'Name': 'Jesse', 'Favorite Color': 'Green', 'Show': 'BB'},
    {'Name': 'Saul', 'Favorite Color': 'Maroon', 'Show': 'BCS'},

# Combine `df1` and our new `df3`, and using .reset_index():
pd.concat([df1, df3]).reset_index()
indexNameJobAgeFavorite ColorShow
Concatenating two DataFrames with some non-identical columns

Concatenating DataFrame Horizontally

If you wanted to combine the two DataFrames horizontally, you can use .concat() with the parameter axis=1. This might be useful if data extends across multiple columns in the two DataFrames.

You can think of this as extending the columns of the first DataFrame, as opposed to extending the rows.

df4 = pd.DataFrame([
    {'Name': 'Saul', 'Job': 'Lawyer', 'Age': 60},
    {'Name': 'Walter', 'Job': 'Teacher', 'Age': 35},
    {'Name': 'Kim', 'Job': 'Lawyer', 'Age': 42},
df5 = pd.DataFrame([
    {'Favorite Color': 'Maroon', 'Show': 'BCS'}, # Saul's continued record
    {'Favorite Color': 'Blue', 'Show': 'BB'},    # Walter's continued record
    {'Favorite Color': 'Red', 'Show': 'BCS'},    # Kim's continued record

pd.concat([df4, df5], axis=1)
NameJobAgeFavorite ColorShow
Concatenate two DataFrames Horizontally

Identical to concatenating the data vertically, the columns are added to row end-to-end. Be careful, as this can be dangerous if you are not absolutely certain your data is in the identical order. If you have a identifier column (for example, the Name columns) in both DataFrames, you should merge the DataFrames based on the contents of the column instead of concatenating.