Creating a DataFrame from a CSV file using Pandas


Many datasets are provided in a comma-separated value file format (file extension .csv). The pd.read_csv function provides two primary ways to read a CSV file.

Reading in a CSV file from a URL

Sometimes, as with many publicly available datasets, you can access the dataset with a URL. One example of this is the course catalog dataset from The University of Illinois (one of the many datasets provided on this site).

To load this dataset, input the URL as a string to the read_csv function.

import pandas as pd
df = pd.read_csv("https://waf.cs.illinois.edu/discovery/course-catalog.csv")
YearTermYearTermSubjectNumberNameDescriptionCredit HoursSection InfoDegree Attributes
02019Fall2019-faAAS100Intro Asian American StudiesInterdisciplinary introduction to the basic co...3 hours.NaNSocial & Beh Sci - Soc Sci, and Cultural Studi...
12019Fall2019-faAAS105Introduction to Arab American StudiesInterdisciplinary introduction to the basic co...3 hours.NaNCultural Studies - US Minority course.
22019Fall2019-faAAS120Intro to Asian Am Pop CultureIntroductory understanding of the way U.S. pop...3 hours.NaNCultural Studies - US Minority course.
32019Fall2019-faAAS199Undergraduate Open SeminarMay be repeated to a maximum of 6 hours.1 TO 5 hours.NaNNaN
42019Fall2019-faAAS200U.S. Race and EmpireInvites students to examine histories and narr...3 hours.Same as LLS 200.Humanities - Hist & Phil, and Cultural Studies...
.................................
85882019Fall2019-faZULU406Advanced Zulu IIContinuation of Zulu 405 with increased emphas...3 hours.NaNNaN
Loading a CSV file from a url

Reading in a CSV file from your local computer

To read in a CSV from your computer, you want to first make sure the file is in the same folder as the python file you are working with:

Photo showing the files are in the same folder

We can see that both of our files are in the Data Science Discovery folder. Then we can input the name of the file in quotes to the read_csv function.

import pandas as pd
df = pd.read_csv("course-catalog.csv")
YearTermYearTermSubjectNumberNameDescriptionCredit HoursSection InfoDegree Attributes
02019Fall2019-faAAS100Intro Asian American StudiesInterdisciplinary introduction to the basic co...3 hours.NaNSocial & Beh Sci - Soc Sci, and Cultural Studi...
12019Fall2019-faAAS105Introduction to Arab American StudiesInterdisciplinary introduction to the basic co...3 hours.NaNCultural Studies - US Minority course.
22019Fall2019-faAAS120Intro to Asian Am Pop CultureIntroductory understanding of the way U.S. pop...3 hours.NaNCultural Studies - US Minority course.
32019Fall2019-faAAS199Undergraduate Open SeminarMay be repeated to a maximum of 6 hours.1 TO 5 hours.NaNNaN
42019Fall2019-faAAS200U.S. Race and EmpireInvites students to examine histories and narr...3 hours.Same as LLS 200.Humanities - Hist & Phil, and Cultural Studies...
.................................
85882019Fall2019-faZULU406Advanced Zulu IIContinuation of Zulu 405 with increased emphas...3 hours.NaNNaN
Loading a CSV locally

Pandas Documentation

The full documentation for read_csv is available in the pandas documentation.