Creating a DataFrame from a CSV file using Pandas
Many datasets are provided in a comma-separated value file format (file extension .csv
). The pd.read_csv
function provides two primary ways to read a CSV file.
Reading in a CSV file from a URL
Sometimes, as with many publicly available datasets, you can access the dataset with a URL. One example of this is the course catalog dataset from The University of Illinois (one of the many datasets provided on this site).
To load this dataset, input the URL as a string to the read_csv
function.
import pandas as pd
df = pd.read_csv("https://waf.cs.illinois.edu/discovery/course-catalog.csv")
Year | Term | YearTerm | Subject | Number | Name | Description | Credit Hours | Section Info | Degree Attributes | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 2019 | Fall | 2019-fa | AAS | 100 | Intro Asian American Studies | Interdisciplinary introduction to the basic co... | 3 hours. | NaN | Social & Beh Sci - Soc Sci, and Cultural Studi... |
1 | 2019 | Fall | 2019-fa | AAS | 105 | Introduction to Arab American Studies | Interdisciplinary introduction to the basic co... | 3 hours. | NaN | Cultural Studies - US Minority course. |
2 | 2019 | Fall | 2019-fa | AAS | 120 | Intro to Asian Am Pop Culture | Introductory understanding of the way U.S. pop... | 3 hours. | NaN | Cultural Studies - US Minority course. |
3 | 2019 | Fall | 2019-fa | AAS | 199 | Undergraduate Open Seminar | May be repeated to a maximum of 6 hours. | 1 TO 5 hours. | NaN | NaN |
4 | 2019 | Fall | 2019-fa | AAS | 200 | U.S. Race and Empire | Invites students to examine histories and narr... | 3 hours. | Same as LLS 200. | Humanities - Hist & Phil, and Cultural Studies... |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
8588 | 2019 | Fall | 2019-fa | ZULU | 406 | Advanced Zulu II | Continuation of Zulu 405 with increased emphas... | 3 hours. | NaN | NaN |
Reading in a CSV file from your local computer
To read in a CSV from your computer, you want to first make sure the file is in the same folder as the python file you are working with:
We can see that both of our files are in the Data Science Discovery folder. Then we can input the name of the file in quotes to the read_csv
function.
import pandas as pd
df = pd.read_csv("course-catalog.csv")
Year | Term | YearTerm | Subject | Number | Name | Description | Credit Hours | Section Info | Degree Attributes | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 2019 | Fall | 2019-fa | AAS | 100 | Intro Asian American Studies | Interdisciplinary introduction to the basic co... | 3 hours. | NaN | Social & Beh Sci - Soc Sci, and Cultural Studi... |
1 | 2019 | Fall | 2019-fa | AAS | 105 | Introduction to Arab American Studies | Interdisciplinary introduction to the basic co... | 3 hours. | NaN | Cultural Studies - US Minority course. |
2 | 2019 | Fall | 2019-fa | AAS | 120 | Intro to Asian Am Pop Culture | Introductory understanding of the way U.S. pop... | 3 hours. | NaN | Cultural Studies - US Minority course. |
3 | 2019 | Fall | 2019-fa | AAS | 199 | Undergraduate Open Seminar | May be repeated to a maximum of 6 hours. | 1 TO 5 hours. | NaN | NaN |
4 | 2019 | Fall | 2019-fa | AAS | 200 | U.S. Race and Empire | Invites students to examine histories and narr... | 3 hours. | Same as LLS 200. | Humanities - Hist & Phil, and Cultural Studies... |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
8588 | 2019 | Fall | 2019-fa | ZULU | 406 | Advanced Zulu II | Continuation of Zulu 405 with increased emphas... | 3 hours. | NaN | NaN |
Pandas Documentation
The full documentation for read_csv
is available in the pandas documentation.