Saving a DataFrame to a CSV file using Pandas
Many times, you may want to make changes to your original dataset as you clean the data, filter by certain categories, and more. After this, you may want to share these new and improved datasets with the world. An easy way to save your dataset is to export it to a CSV file that can then be shared. This can be done with the pandas to_csv
function.
Saving a DataFrame as a CSV
For this example, we will use the GPA Dataset, a dataset that contains the GPA for courses at the University of Illinois over a 10 year period.
First we will read in the dataset using pd.read_csv
. A more detailed guide on how to create a dataset from a csv can be found here. Let's say for our assignment, we only want to look at the GPAs of Statistics classes for the past 10 years and save that dataset as a CSV. We saved that dataset into the variable stat_gpas
.
Then we can use the to_csv
function to save that DataFrame as a file named STAT_GPA.csv
. Usually, we will not include the index in the CSV file, so we will use the index
parameter and set it to False
. If we don't specify, the CSV file will include a column for the index.
Saving the data in a different folder
You can see how the dataset is now saved in a file in the same folder that our python notebook is in. This is because we used a relative path. If we want to save the file in a different folder, we typically would specify where in our computer system the file will be placed using an absolute path.
On Mac:
User paths start with
/Users/
, followed by the name of the user currently using the system. For example, if your name is Emily, your home or "root directory" will be:/Users/Emily/
To place our file in a different folder, we have to tell the computer which folders it is in. Maybe we want to put it in our Downloads folder so the file is easy to access. Then the path would look like this:
/Users/Emily/Downloads/STAT_GPA.csv
On Windows:
- All paths start with
C:\
, for the root directory. To place our file in our downloads folder, it would look like this:C:\Users\Emily\Downloads\STAT_GPA.csv
.
- All paths start with
We can then use that path as the input to our .to_csv
function.
Saving a CSV inside of a Zip file
For an easier way to export, we can not only save the DataFrame as a CSV, but save that CSV inside of a zip file. Zip files are single files containing one or more compressed files, meaning they take up less space than a regular file and making sharing files quicker and easier. Since the files are compressed, you must "zip" or compress the file when you save it, and "unzip" the file when you want to use it.
Continuing with the same dataset as above, we will create a zip file that will save and compress our stat_gpas dataset.
We first have to declare the method of compressing our files, and which file(s) we will be saving in this zip file in a dictionary. In this example, we'll name this dictionary compression_opts
. We must set our method
equal to 'zip'
and the archive_name
to the name of our new CSV file, 'STAT_GPA.csv'
.
Then we can use the .to_csv
function on our dataset. Again, we'll use a relative path to save a new zip file called out.zip
to our current folder. To turn the CSV into a zip file, we set the compression
parameter of the .to_csv
function to be the dictionary we defined above with our compression method and files.
Pandas Documentation
The full documentation for to_csv
is available in the pandas documentation.