Manipulating columns — sometimes called "variables" — is a foundational data science skill.
The Movie Dataset
To explore columns and series, we'll use a DataFrame of five different movies, including information about their release date and how much money they made in US dollars.
Both DataFrames and Series are methods of storing data with Pandas, but there are a few differences between them. Series are one-dimensional, often displayed as lists of data. DataFrames, on the other hand, are two-dimensional, like tables of data.
To load existing data into a DataFrame, use the pandas function below. The data parameter can be filled with any two-dimensional data structure, including .csv files and Excel files. Store it in a variable so you can apply pandas operations on it later.
df = pd.DataFrame(data =...)
Making a DataFrame
To load existing data into a Series, use the pandas function below. Here, the data parameter can be filled with any one-dimensional data structure, including DataFrame columns, lists, and dictionaries. After storing the Series in a variable, you can work with it and apply functions to it just like a DataFrame.
series = pd.Series(data =...)
Making a Series
Note: If data is your only parameter, you don't have two write data = in the parenthesis.
Series within DataFrames
We can think of a DataFrame as a bunch of Series put together to make a table. In this context, a Series is a single column of a DataFrame, but in list form. To pull a Series out of a DataFrame, use a single set of brackets around the column name:
Sometimes, we want to select a certain group of columns from a DataFrame instead of looking at the whole thing.
And don't forget to store the result in a new variable — df2, for example — so you just overwrite your original DataFrame!
Single Column
The code for creating a new DataFrame with one column involves double square brackets. The two sets of brackets are important, as they keep the data in the two-dimensional table format. Otherwise, we'd be creating a one-dimensional list.
Creating a DataFrame with a subset of multiple columns is similar. This time, put a list of the columns you want inside the inner list (the innermost bracket). Each column name should be separated by a comma.
It can be helpful to rename a column. You might want to have a name that's more self-explanatory, easier to remember, or easier to work with in code. This is especially true for datasets created by other people.
For example, you might think that title is a better name for the movie column, just because it makes more sense to you. Remember, set your code equal to a variable to save your results. Here's how to rename a column with pandas: