Creating a DataFrame from a CSV file using Pandas


Many datasets are provided in a comma-separated value file format (file extension .csv). The pd.read_csv function provides two primary ways to read a CSV file.

Reading in a CSV file from a URL

Sometimes, as with many publicly available datasets, you can access the dataset with a URL. One example of this is the course catalog dataset from The University of Illinois (one of the many datasets provided on this site).

To load this dataset, input the URL as a string to the read_csv function.

Reset Code Python Output:
Year Term YearTerm Subject Number Name Description Credit Hours Section Info Degree Attributes
0 2019 Fall 2019-fa AAS 100 Intro Asian American Studies Interdisciplinary introduction to the basic co... 3 hours. NaN Social & Beh Sci - Soc Sci, and Cultural Studi...
1 2019 Fall 2019-fa AAS 105 Introduction to Arab American Studies Interdisciplinary introduction to the basic co... 3 hours. NaN Cultural Studies - US Minority course.
2 2019 Fall 2019-fa AAS 120 Intro to Asian Am Pop Culture Introductory understanding of the way U.S. pop... 3 hours. NaN Cultural Studies - US Minority course.
3 2019 Fall 2019-fa AAS 199 Undergraduate Open Seminar May be repeated to a maximum of 6 hours. 1 TO 5 hours. NaN NaN
4 2019 Fall 2019-fa AAS 200 U.S. Race and Empire Invites students to examine histories and narr... 3 hours. Same as LLS 200. Humanities - Hist & Phil, and Cultural Studies...
... ... ... ... ... ... ... ... ... ... ...
8588 2019 Fall 2019-fa ZULU 406 Advanced Zulu II Continuation of Zulu 405 with increased emphas... 3 hours. NaN NaN

Reading in a CSV file from your local computer

To read in a CSV from your computer, you want to first make sure the file is in the same folder as the python file you are working with:

Photo showing the files are in the same folder

We can see that both of our files are in the Data Science Discovery folder. Then we can input the name of the file in quotes to the read_csv function.

import pandas as pd
df = pd.read_csv("course-catalog.csv")
df
YearTermYearTermSubjectNumberNameDescriptionCredit HoursSection InfoDegree Attributes
02019Fall2019-faAAS100Intro Asian American StudiesInterdisciplinary introduction to the basic co...3 hours.NaNSocial & Beh Sci - Soc Sci, and Cultural Studi...
12019Fall2019-faAAS105Introduction to Arab American StudiesInterdisciplinary introduction to the basic co...3 hours.NaNCultural Studies - US Minority course.
22019Fall2019-faAAS120Intro to Asian Am Pop CultureIntroductory understanding of the way U.S. pop...3 hours.NaNCultural Studies - US Minority course.
32019Fall2019-faAAS199Undergraduate Open SeminarMay be repeated to a maximum of 6 hours.1 TO 5 hours.NaNNaN
42019Fall2019-faAAS200U.S. Race and EmpireInvites students to examine histories and narr...3 hours.Same as LLS 200.Humanities - Hist & Phil, and Cultural Studies...
.................................
85882019Fall2019-faZULU406Advanced Zulu IIContinuation of Zulu 405 with increased emphas...3 hours.NaNNaN

Pandas Documentation

The full documentation for read_csv is available in the pandas documentation.