Working With Columns and Series in a DataFrame

Manipulating columns — sometimes called "variables" — is a foundational data science skill.

The Movie Dataset

To explore columns and series, we'll use a DataFrame of five different movies, including information about their release date and how much money they made in US dollars.

import pandas as pd\n&nbsp;\n#Creates a DataFrame of "movie", "release date", "domestic gross", and "worldwide gross" columns\ndf = pd.DataFrame([\n  {"movie": "The Truman Show", "release date": "1996-06-05", "domestic box office": 125618201, "worldwide box office": 264118201},\n  {"movie": "Rogue One: A Star Wars Story", "release date": "2016-12-16", "domestic box office": 532177324, "worldwide box office": 1055135598},\n  {"movie": "Iron Man", "release date": "2008-05-02", "domestic box office": 318604126, "worldwide box office": 585171547},\n  {"movie": "Blade Runner", "release date": "1982-06-25", "domestic box office": 32656328, "worldwide box office": 39535837},\n  {"movie": "Breakfast at Tiffany's", "release date": "1961-10-05", "domestic box office": 9551904, "worldwide box office": 9794721}\n])\ndf

Reset Code Python Output:


  
    
      
      movie
      release date
      domestic box office
      worldwide box office
    
  
  
    
      0
      The Truman Show
      1996-06-05
      125618201
      264118201
    
    
      1
      Rogue One: A Star Wars Story
      2016-12-16
      532177324
      1055135598
    
    
      2
      Iron Man
      2008-05-02
      318604126
      585171547
    
    
      3
      Blade Runner
      1982-06-25
      32656328
      3953583
    
    
      4
      Breakfast at Tiffany's
      1961-10-05
      9551904
      9794721

	movie	release date	domestic box office	worldwide box office
0	The Truman Show	1996-06-05	125618201	264118201
1	Rogue One: A Star Wars Story	2016-12-16	532177324	1055135598
2	Iron Man	2008-05-02	318604126	585171547
3	Blade Runner	1982-06-25	32656328	3953583
4	Breakfast at Tiffany's	1961-10-05	9551904	9794721

DataFrames vs. Series

Both DataFrames and Series are methods of storing data with Pandas, but there are a few differences between them. Series are one-dimensional, often displayed as lists of data. DataFrames, on the other hand, are two-dimensional, like tables of data.

To load existing data into a DataFrame, use the pandas function below. The data parameter can be filled with any two-dimensional data structure, including .csv files and Excel files. Store it in a variable so you can apply pandas operations on it later.

df = pd.DataFrame(data = ...)

Making a DataFrame

To load existing data into a Series, use the pandas function below. Here, the data parameter can be filled with any one-dimensional data structure, including DataFrame columns, lists, and dictionaries. After storing the Series in a variable, you can work with it and apply functions to it just like a DataFrame.

series = pd.Series(data = ...)

Making a Series

Note: If data is your only parameter, you don't have two write data = in the parenthesis.

Series within DataFrames

We can think of a DataFrame as a bunch of Series put together to make a table. In this context, a Series is a single column of a DataFrame, but in list form. To pull a Series out of a DataFrame, use a single set of brackets around the column name:

import pandas as pd\n&nbsp;\n#Creates a DataFrame of "movie", "release date", "domestic gross", and "worldwide gross" columns\ndf = pd.DataFrame([\n  {"movie": "The Truman Show", "release date": "1996-06-05", "domestic box office": 125618201, "worldwide box office": 264118201},\n  {"movie": "Rogue One: A Star Wars Story", "release date": "2016-12-16", "domestic box office": 532177324, "worldwide box office": 1055135598},\n  {"movie": "Iron Man", "release date": "2008-05-02", "domestic box office": 318604126, "worldwide box office": 585171547},\n  {"movie": "Blade Runner", "release date": "1982-06-25", "domestic box office": 32656328, "worldwide box office": 39535837},\n  {"movie": "Breakfast at Tiffany's", "release date": "1961-10-05", "domestic box office": 9551904, "worldwide box office": 9794721}\n])\nworldboxoffice_series = df["worldwide box office"]\nworldboxoffice_series

Reset Code Python Output:

0     264118201
1    1055135598
2     585171547
3      39535837
4       9794721
Name: worldwide box office, dtype: int64

Creating a DataFrame With a Subset of Columns

Sometimes, we want to select a certain group of columns from a DataFrame instead of looking at the whole thing.

And don't forget to store the result in a new variable — df2, for example — so you just overwrite your original DataFrame!

Single Column

The code for creating a new DataFrame with one column involves double square brackets. The two sets of brackets are important, as they keep the data in the two-dimensional table format. Otherwise, we'd be creating a one-dimensional list.

import pandas as pd\n&nbsp;\n#Creates a DataFrame of "movie", "release date", "domestic gross", and "worldwide gross" columns\ndf = pd.DataFrame([\n  {"movie": "The Truman Show", "release date": "1996-06-05", "domestic box office": 125618201, "worldwide box office": 264118201},\n  {"movie": "Rogue One: A Star Wars Story", "release date": "2016-12-16", "domestic box office": 532177324, "worldwide box office": 1055135598},\n  {"movie": "Iron Man", "release date": "2008-05-02", "domestic box office": 318604126, "worldwide box office": 585171547},\n  {"movie": "Blade Runner", "release date": "1982-06-25", "domestic box office": 32656328, "worldwide box office": 39535837},\n  {"movie": "Breakfast at Tiffany's", "release date": "1961-10-05", "domestic box office": 9551904, "worldwide box office": 9794721}\n])\ndf_release_date = df[["release date"]]\ndf_release_date

Reset Code Python Output:


  
    
      
      release date
    
  
  
    
      0
      1996-06-05
    
    
      1
      2016-12-16
    
    
      2
      2008-05-02
    
    
      3
      1982-06-25
    
    
      4
      1961-10-05

	release date
0	1996-06-05
1	2016-12-16
2	2008-05-02
3	1982-06-25
4	1961-10-05

Multiple Columns

Creating a DataFrame with a subset of multiple columns is similar. This time, put a list of the columns you want inside the inner list (the innermost bracket). Each column name should be separated by a comma.

import pandas as pd\n&nbsp;\n#Creates a DataFrame of "movie", "release date", "domestic gross", and "worldwide gross" columns\ndf = pd.DataFrame([\n  {"movie": "The Truman Show", "release date": "1996-06-05", "domestic box office": 125618201, "worldwide box office": 264118201},\n  {"movie": "Rogue One: A Star Wars Story", "release date": "2016-12-16", "domestic box office": 532177324, "worldwide box office": 1055135598},\n  {"movie": "Iron Man", "release date": "2008-05-02", "domestic box office": 318604126, "worldwide box office": 585171547},\n  {"movie": "Blade Runner", "release date": "1982-06-25", "domestic box office": 32656328, "worldwide box office": 39535837},\n  {"movie": "Breakfast at Tiffany's", "release date": "1961-10-05", "domestic box office": 9551904, "worldwide box office": 9794721}\n])\ndf_box_office_comparison = df[["movie", "domestic box office", "worldwide box office"]]\ndf_box_office_comparison

Reset Code Python Output:


  
    
      
      movie
      domestic box office
      worldwide box office
    
  
  
    
      0
      The Truman Show
      125618201
      264118201
    
    
      1
      Rogue One: A Star Wars Story
      532177324
      1055135598
    
    
      2
      Iron Man
      318604126
      585171547
    
    
      3
      Blade Runner
      32656328
      39535837
    
    
      4
      Breakfast at Tiffany's
      9551904
      9794721

	movie	domestic box office	worldwide box office
0	The Truman Show	125618201	264118201
1	Rogue One: A Star Wars Story	532177324	1055135598
2	Iron Man	318604126	585171547
3	Blade Runner	32656328	39535837
4	Breakfast at Tiffany's	9551904	9794721

Renaming Columns

It can be helpful to rename a column. You might want to have a name that's more self-explanatory, easier to remember, or easier to work with in code. This is especially true for datasets created by other people.

For example, you might think that title is a better name for the movie column, just because it makes more sense to you. Remember, set your code equal to a variable to save your results. Here's how to rename a column with pandas:

import pandas as pd\n&nbsp;\n#Creates a DataFrame of "movie", "release date", "domestic gross", and "worldwide gross" columns\ndf = pd.DataFrame([\n  {"movie": "The Truman Show", "release date": "1996-06-05", "domestic box office": 125618201, "worldwide box office": 264118201},\n  {"movie": "Rogue One: A Star Wars Story", "release date": "2016-12-16", "domestic box office": 532177324, "worldwide box office": 1055135598},\n  {"movie": "Iron Man", "release date": "2008-05-02", "domestic box office": 318604126, "worldwide box office": 585171547},\n  {"movie": "Blade Runner", "release date": "1982-06-25", "domestic box office": 32656328, "worldwide box office": 39535837},\n  {"movie": "Breakfast at Tiffany's", "release date": "1961-10-05", "domestic box office": 9551904, "worldwide box office": 9794721}\n])\ndf_rename = df.rename(columns = {"movie": "title"})\ndf_rename

Reset Code Python Output:


  
    
      
      title
      release date
      domestic box office
      worldwide box office
    
  
  
    
      0
      The Truman Show
      1996-06-05
      125618201
      264118201
    
    
      1
      Rogue One: A Star Wars Story
      2016-12-16
      532177324
      1055135598
    
    
      2
      Iron Man
      2008-05-02
      318604126
      585171547
    
    
      3
      Blade Runner
      1982-06-25
      32656328
      39535837
    
    
      4
      Breakfast at Tiffany's
      1961-10-05
      9551904
      9794721

	title	release date	domestic box office	worldwide box office
0	The Truman Show	1996-06-05	125618201	264118201
1	Rogue One: A Star Wars Story	2016-12-16	532177324	1055135598
2	Iron Man	2008-05-02	318604126	585171547
3	Blade Runner	1982-06-25	32656328	39535837
4	Breakfast at Tiffany's	1961-10-05	9551904	9794721

Take note of the curly braces inside the parenthesis! Your new DataFrame will look exactly the same besides the name you changed.