Creating New Columns in a DataFrame

There are two primary methods of creating new columns in a DataFrame:

Creating a new column calculated from the data you already have (ex: adding a new, calculated value to your DataFrame), or
Creating a new column of new data, directly in Python, that is not from another dataset or otherwise already exists.

Note: Creating a new column is different than merging two existing DataFrames together. If you're looking for that, see TODO.

The Movie Dataset

Throughout this guide, we will use a small DataFrame with data about movies:

import pandas as pd\n&nbsp;\n#Creates a DataFrame of "movie", "release date", "domestic gross," and "worldwide gross" columns\ndf = pd.DataFrame([\n  {"movie": "The Truman Show", "release date": "1996-06-05", "domestic box office": 125618201, "worldwide box office": 264118201},\n  {"movie": "Rogue One: A Star Wars Story", "release date": "2016-12-16", "domestic box office": 532177324, "worldwide box office": 1055135598},\n  {"movie": "Iron Man", "release date": "2008-05-02", "domestic box office": 318604126, "worldwide box office": 585171547},\n  {"movie": "Blade Runner", "release date": "1982-06-25", "domestic box office": 32656328, "worldwide box office": 39535837},\n  {"movie": "Breakfast at Tiffany's", "release date": "1961-10-05", "domestic box office": 9551904, "worldwide box office": 9794721}\n  ])\ndf

Reset Code Python Output:


  
    
      
      movie
      release date
      domestic box office
      worldwide box office
    
  
  
    
      0
      The Truman Show
      1996-06-05
      125618201
      264118201
    
    
      1
      Rogue One: A Star Wars Story
      2016-12-16
      532177324
      1055135598
    
    
      2
      Iron Man
      2008-05-02
      318604126
      585171547
    
    
      3
      Blade Runner
      1982-06-25
      32656328
      3953583
    
    
      4
      Breakfast at Tiffany's
      1961-10-05
      9551904
      9794721

	movie	release date	domestic box office	worldwide box office
0	The Truman Show	1996-06-05	125618201	264118201
1	Rogue One: A Star Wars Story	2016-12-16	532177324	1055135598
2	Iron Man	2008-05-02	318604126	585171547
3	Blade Runner	1982-06-25	32656328	3953583
4	Breakfast at Tiffany's	1961-10-05	9551904	9794721

Create a New Column Using a Calculation

We can perform simple mathematical operations on columns and store the resulting numbers in a new column. This includes addition, multiplication, subtraction, and division. We do this so our results are easier to see and available for future analysis.

Let's think about the box office columns in the movie DataFrame. We already have domestic (US) box office and worldwide box office. But what if we wanted to figure out the international box office for each movie? To find this value for each movie, we could subtract domestic from worldwide for every movie by hand — or we could allow Python to do it for us:

import pandas as pd\n&nbsp;\n#Creates a DataFrame of "movie", "release date", "domestic gross," and "worldwide gross" columns\ndf = pd.DataFrame([\n  {"movie": "The Truman Show", "release date": "1996-06-05", "domestic box office": 125618201, "worldwide box office": 264118201},\n  {"movie": "Rogue One: A Star Wars Story", "release date": "2016-12-16", "domestic box office": 532177324, "worldwide box office": 1055135598},\n  {"movie": "Iron Man", "release date": "2008-05-02", "domestic box office": 318604126, "worldwide box office": 585171547},\n  {"movie": "Blade Runner", "release date": "1982-06-25", "domestic box office": 32656328, "worldwide box office": 39535837},\n  {"movie": "Breakfast at Tiffany's", "release date": "1961-10-05", "domestic box office": 9551904, "worldwide box office": 9794721}\n  ])\ndf["international box office"] = df["worldwide box office"] - df["domestic box office"]\ndf

Reset Code Python Output:


  
    
      
      movie
      release date
      domestic box office
      worldwide box office
      personal rating
      international box office
    
  
  
    
      0
      The Truman Show
      1996-06-05
      125618201
      264118201
      10
      138500000
    
    
      1
      Rogue One: A Star Wars Story
      2016-12-16
      532177324
      1055135598
      9
      522958274
    
    
      2
      Iron Man
      2008-05-02
      318604126
      585171547
      7
      266567421
    
    
      3
      Blade Runner
      1982-06-25
      32656328
      39535837
      8
      6879509
    
    
      4
      Breakfast at Tiffany's
      1961-10-05
      9551904
      9794721
      7
      242817

	movie	release date	domestic box office	worldwide box office	personal rating	international box office
0	The Truman Show	1996-06-05	125618201	264118201	10	138500000
1	Rogue One: A Star Wars Story	2016-12-16	532177324	1055135598	9	522958274
2	Iron Man	2008-05-02	318604126	585171547	7	266567421
3	Blade Runner	1982-06-25	32656328	39535837	8	6879509
4	Breakfast at Tiffany's	1961-10-05	9551904	9794721	7	242817

Your DataFrame has been permanently modified and will always contain the new columns.

Create a New Columns With New Data

We can add a new column of data directly with new data organized in a list. One reason we might want to add a column is when we obtain a brand new variable related to the DataFrame. Suppose you wanted to include a new column of your personal rating for each movie:

import pandas as pd\n&nbsp;\n#Creates a DataFrame of "movie", "release date", "domestic gross," and "worldwide gross" columns\ndf = pd.DataFrame([\n  {"movie": "The Truman Show", "release date": "1996-06-05", "domestic box office": 125618201, "worldwide box office": 264118201},\n  {"movie": "Rogue One: A Star Wars Story", "release date": "2016-12-16", "domestic box office": 532177324, "worldwide box office": 1055135598},\n  {"movie": "Iron Man", "release date": "2008-05-02", "domestic box office": 318604126, "worldwide box office": 585171547},\n  {"movie": "Blade Runner", "release date": "1982-06-25", "domestic box office": 32656328, "worldwide box office": 39535837},\n  {"movie": "Breakfast at Tiffany's", "release date": "1961-10-05", "domestic box office": 9551904, "worldwide box office": 9794721}\n  ])\n# When specifying a list, the order of the data must match the order of the DataFrame exactly:\ndf["personal rating"] = [10, 9, 7, 8, 7]\ndf

Reset Code Python Output:


  
    
      
      movie
      release date
      domestic box office
      worldwide box office
      personal rating
    
  
  
    
      0
      The Truman Show
      1996-06-05
      125618201
      264118201
      10
    
    
      1
      Rogue One: A Star Wars Story
      2016-12-16
      532177324
      1055135598
      9
    
    
      2
      Iron Man
      2008-05-02
      318604126
      585171547
      7
    
    
      3
      Blade Runner
      1982-06-25
      32656328
      39535837
      8
    
    
      4
      Breakfast at Tiffany's
      1961-10-05
      9551904
      9794721
      7

	movie	release date	domestic box office	worldwide box office	personal rating
0	The Truman Show	1996-06-05	125618201	264118201	10
1	Rogue One: A Star Wars Story	2016-12-16	532177324	1055135598	9
2	Iron Man	2008-05-02	318604126	585171547	7
3	Blade Runner	1982-06-25	32656328	39535837	8
4	Breakfast at Tiffany's	1961-10-05	9551904	9794721	7

Creating a New Column With `df.loc`

If you have data that exists for only a small number of observations, you can use .loc to modify a DataFrame based on the row index value and the column name. A row index value is the leftmost, bold column in a DataFrame that defaults to a numbered list starting at 0 . When using .loc, choose the row index value(s) that correspond to the rows you have information for.

Continuing our example, say we learned that the critics' rating for "Blade Runner" is an 8.9. Since "Blade Runner" is in the row with index 3, we add its critic rating with the following code:

import pandas as pd\n&nbsp;\n#Creates a DataFrame of "movie", "release date", "domestic gross," and "worldwide gross" columns\ndf = pd.DataFrame([\n  {"movie": "The Truman Show", "release date": "1996-06-05", "domestic box office": 125618201, "worldwide box office": 264118201},\n  {"movie": "Rogue One: A Star Wars Story", "release date": "2016-12-16", "domestic box office": 532177324, "worldwide box office": 1055135598},\n  {"movie": "Iron Man", "release date": "2008-05-02", "domestic box office": 318604126, "worldwide box office": 585171547},\n  {"movie": "Blade Runner", "release date": "1982-06-25", "domestic box office": 32656328, "worldwide box office": 39535837},\n  {"movie": "Breakfast at Tiffany's", "release date": "1961-10-05", "domestic box office": 9551904, "worldwide box office": 9794721}\n  ])\ndf.loc[3, "critic rating"] = 8.9\ndf

Reset Code Python Output:


  
    
      
      movie
      release date
      domestic box office
      worldwide box office
      personal rating
      (3, critic rating)
    
  
  
    
      0
      The Truman Show
      1996-06-05
      125618201
      264118201
      10
      NaN
    
    
      1
      Rogue One: A Star Wars Story
      2016-12-16
      532177324
      1055135598
      9
      NaN
    
    
      2
      Iron Man
      2008-05-02
      318604126
      585171547
      7
      NaN
    
    
      3
      Blade Runner
      1982-06-25
      32656328
      39535837
      8
      8.9
    
    
      4
      Breakfast at Tiffany's
      1961-10-05
      9551904
      9794721
      7
      NaN