Saving a DataFrame to a CSV file using Pandas

Many times, you may want to make changes to your original dataset as you clean the data, filter by certain categories, and more. After this, you may want to share these new and improved datasets with the world. An easy way to save your dataset is to export it to a CSV file that can then be shared. This can be done with the pandas to_csv function.

Saving a DataFrame as a CSV

For this example, we will use the GPA Dataset, a dataset that contains the GPA for courses at the University of Illinois over a 10 year period.

First we will read in the dataset using pd.read_csv. A more detailed guide on how to create a dataset from a csv can be found here. Let's say for our assignment, we only want to look at the GPAs of Statistics classes for the past 10 years and save that dataset as a CSV. We saved that dataset into the variable stat_gpas.

Then we can use the to_csv function to save that DataFrame as a file named STAT_GPA.csv. Usually, we will not include the index in the CSV file, so we will use the index parameter and set it to False. If we don't specify, the CSV file will include a column for the index.

import pandas as pd 
df = pd.read_csv("https://waf.cs.illinois.edu/discovery/gpa.csv")
stat_gpas = df[df["Subject"] == "STAT"]
stat_gpas.to_csv("STAT_GPA.csv", index=False)

Saving the DataFrame to a CSV

Saving the data in a different folder

You can see how the dataset is now saved in a file in the same folder that our python notebook is in. This is because we used a relative path. If we want to save the file in a different folder, we typically would specify where in our computer system the file will be placed using an absolute path.

On Mac:
- User paths start with /Users/, followed by the name of the user currently using the system. For example, if your name is Emily, your home or "root directory" will be: /Users/Emily/
- To place our file in a different folder, we have to tell the computer which folders it is in. Maybe we want to put it in our Downloads folder so the file is easy to access. Then the path would look like this: /Users/Emily/Downloads/STAT_GPA.csv
On Windows:
- All paths start with C:\, for the root directory. To place our file in our downloads folder, it would look like this: C:\Users\Emily\Downloads\STAT_GPA.csv.

We can then use that path as the input to our .to_csv function.

stat_gpas.to_csv("/Users/Karle/Downloads/STAT_GPA.csv", index=False)

Saving the DataFrame to a CSV

Saving a CSV inside of a Zip file

For an easier way to export, we can not only save the DataFrame as a CSV, but save that CSV inside of a zip file. Zip files are single files containing one or more compressed files, meaning they take up less space than a regular file and making sharing files quicker and easier. Since the files are compressed, you must "zip" or compress the file when you save it, and "unzip" the file when you want to use it.

Continuing with the same dataset as above, we will create a zip file that will save and compress our stat_gpas dataset.

We first have to declare the method of compressing our files, and which file(s) we will be saving in this zip file in a dictionary. In this example, we'll name this dictionary compression_opts. We must set our method equal to 'zip' and the archive_name to the name of our new CSV file, 'STAT_GPA.csv'.

Then we can use the .to_csv function on our dataset. Again, we'll use a relative path to save a new zip file called out.zip to our current folder. To turn the CSV into a zip file, we set the compression parameter of the .to_csv function to be the dictionary we defined above with our compression method and files.

compression_opts = dict(method='zip',archive_name='STAT_GPA.csv')  
stat_gpas.to_csv('out.zip', index=False, compression=compression_opts)

Saving the DataFrame to a Zip File

Pandas Documentation

The full documentation for to_csv is available in the pandas documentation.