Berkeley's 1973 Graduate Admissions Dataset

The "Berkeley Dataset" contains all 12,763 applicants to UC-Berkeley's graduate programs in Fall 1973. This dataset was published by UC-Berkeley researchers in an analysis to understand the possible gender bias in admissions and has now become a classic example of Simpson's Paradox.

Dataset Format: Well-formatted CSV with column headers as the first row
Dataset Size: 12,763 rows × 4 columns
CSV File Location: https://waf.cs.illinois.edu/discovery/berkeley.csv
Dataset Variables:
- Year : number ➜ The application year (this data is always 1973)
- Major : string ➜: An anonymized major code (either A, B, C, D, E, F, or Other). The specific majors are unknown except that A-F are the six majors with the most applicants in Fall 1973
- Gender : string ➜ Applicant self-reported gender (either M or F)
- Admission: string ➜ Admission decision (either Rejected or Accepted)
Research Paper: Sex Bias in Graduate Admissions: Data from Berkeley by P. J. Bickel, E. A. Hammel, and J. W. O'Connell (1975)

Using the Berkeley Dataset in Python

The dataset can be loaded using the pandas library in Python:

import pandas as pd
df = pd.read_csv("https://waf.cs.illinois.edu/discovery/berkeley.csv")
df

	Year	Major	Gender	Admission
0	1973	C	F	Rejected
1	1973	B	M	Accepted
2	1973	Other	F	Accepted
3	1973	Other	M	Accepted
4	1973	Other	M	Rejected
...	...	...	...	...
12758	1973	Other	M	Accepted
12759	1973	D	M	Accepted
12760	1973	Other	F	Rejected
12761	1973	Other	M	Rejected
12762	1973	Other	M	Accepted

The full Berkeley Dataset stored in a DataFrame (12,763 rows).

Berkeley's 1973 Graduate Admissions Dataset

Using the Berkeley Dataset in Python

Pages Using the Berkeley Dataset