Working With Columns and Series in a DataFrame


Manipulating columns — sometimes called "variables" — is a foundational data science skill.

The Movie Dataset

To explore columns and series, we'll use a DataFrame of five different movies, including information about their release date and how much money they made in US dollars.

Reset Code Python Output:
movie release date domestic box office worldwide box office
0 The Truman Show 1996-06-05 125618201 264118201
1 Rogue One: A Star Wars Story 2016-12-16 532177324 1055135598
2 Iron Man 2008-05-02 318604126 585171547
3 Blade Runner 1982-06-25 32656328 3953583
4 Breakfast at Tiffany's 1961-10-05 9551904 9794721

DataFrames vs. Series

Both DataFrames and Series are methods of storing data with Pandas, but there are a few differences between them. Series are one-dimensional, often displayed as lists of data. DataFrames, on the other hand, are two-dimensional, like tables of data.

To load existing data into a DataFrame, use the pandas function below. The data parameter can be filled with any two-dimensional data structure, including .csv files and Excel files. Store it in a variable so you can apply pandas operations on it later.

df = pd.DataFrame(data = ...)
Making a DataFrame

To load existing data into a Series, use the pandas function below. Here, the data parameter can be filled with any one-dimensional data structure, including DataFrame columns, lists, and dictionaries. After storing the Series in a variable, you can work with it and apply functions to it just like a DataFrame.

series = pd.Series(data = ...)
Making a Series

Note: If data is your only parameter, you don't have two write data = in the parenthesis.

Series within DataFrames

We can think of a DataFrame as a bunch of Series put together to make a table. In this context, a Series is a single column of a DataFrame, but in list form. To pull a Series out of a DataFrame, use a single set of brackets around the column name:

Reset Code Python Output:
0     264118201
1    1055135598
2     585171547
3      39535837
4       9794721
Name: worldwide box office, dtype: int64

Creating a DataFrame With a Subset of Columns

Sometimes, we want to select a certain group of columns from a DataFrame instead of looking at the whole thing.

And don't forget to store the result in a new variable — df2, for example — so you just overwrite your original DataFrame!

Single Column

The code for creating a new DataFrame with one column involves double square brackets. The two sets of brackets are important, as they keep the data in the two-dimensional table format. Otherwise, we'd be creating a one-dimensional list.

Reset Code Python Output:
release date
0 1996-06-05
1 2016-12-16
2 2008-05-02
3 1982-06-25
4 1961-10-05

Multiple Columns

Creating a DataFrame with a subset of multiple columns is similar. This time, put a list of the columns you want inside the inner list (the innermost bracket). Each column name should be separated by a comma.

Reset Code Python Output:
movie domestic box office worldwide box office
0 The Truman Show 125618201 264118201
1 Rogue One: A Star Wars Story 532177324 1055135598
2 Iron Man 318604126 585171547
3 Blade Runner 32656328 39535837
4 Breakfast at Tiffany's 9551904 9794721

Renaming Columns

It can be helpful to rename a column. You might want to have a name that's more self-explanatory, easier to remember, or easier to work with in code. This is especially true for datasets created by other people.

For example, you might think that title is a better name for the movie column, just because it makes more sense to you. Remember, set your code equal to a variable to save your results. Here's how to rename a column with pandas:

Reset Code Python Output:
title release date domestic box office worldwide box office
0 The Truman Show 1996-06-05 125618201 264118201
1 Rogue One: A Star Wars Story 2016-12-16 532177324 1055135598
2 Iron Man 2008-05-02 318604126 585171547
3 Blade Runner 1982-06-25 32656328 39535837
4 Breakfast at Tiffany's 1961-10-05 9551904 9794721

Take note of the curly braces inside the parenthesis! Your new DataFrame will look exactly the same besides the name you changed.