Perception of Probability Words Dataset
The "Probability Words Dataset" is a survey of primary undergraduate students at The University of Illinois and their perception of words that describe a probability of rain on a given day.
- Dataset Format: Well-formatted CSV with column headers as the first row
- Dataset Size: 75 rows × 17 columns
- CSV File Location: https://waf.cs.illinois.edu/discovery/words.csv
- Dataset Variables:
Almost Certain: number ➜ The respondent's perception of the probability when they are told it is "almost certain it will rain tomorrow".Highly Likely: number ➜ The respondent's perception of the probability when they are told it is "highly likely it will rain tomorrow".Very Good Chance: number ➜ The respondent's perception of the probability when they are told there is a "very good chance it will rain tomorrow".Probable: number ➜ The respondent's perception of the probability when they are told it is "probable it will rain tomorrow".Likely: number ➜ The respondent's perception of the probability when they are told it is "likely it will rain tomorrow".We Believe: number ➜ The respondent's perception of the probability when they are told that "we believe it will rain tomorrow".Probably: number ➜ The respondent's perception of the probability when they are told it "probably will rain tomorrow".Better than Even: number ➜ The respondent's perception of the probability when they are told it is "better than even it will rain tomorrow".About Even: number ➜ The respondent's perception of the probability when they are told it is "about even it will rain tomorrow".We Doubt: number ➜ The respondent's perception of the probability when they are told that "we doubt it will rain tomorrow".Improbable: number ➜ The respondent's perception of the probability when they are told it is "improbable it will rain tomorrow".Unlikely: number ➜ The respondent's perception of the probability when they are told it is "unlikely it will rain tomorrow".Probably Not: number ➜ The respondent's perception of the probability when they are told it will "probably not rain tomorrow".Little Chance: number ➜ The respondent's perception of the probability when they are told it is "little chance it will rain tomorrow".Almost No Chance: number ➜ The respondent's perception of the probability when they are told it is "almost no chance it will rain tomorrow".Highly Unlikely: number ➜ The respondent's perception of the probability when they are told it is "highly unlikely it will rain tomorrow".Chances are Slight: number ➜ The respondent's perception of the probability when they are told that "chances are slight it will rain tomorrow".
- Data Collection: When this dataset was presented to users, the phrases appeared in a random order as a measure to eliminate possible bias.
Using the Probability Words Dataset in Python
The dataset can be loaded using the pandas library in Python:
import pandas as pd
df = pd.read_csv("https://waf.cs.illinois.edu/discovery/words.csv")
df| Almost Certain | Highly Likely | Very Good Chance | Probable | Likely | We Believe | Probably | Better than Even | About Even | We Doubt | Improbable | Unlikely | Probably Not | Little Chance | Almost No Chance | Highly Unlikely | Chances are Slight | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 90 | 90 | 80 | 70 | 70 | 70.0 | 70 | 60 | 50 | 20.0 | 20 | 20 | 20 | 20 | 10.0 | 10.0 | 10 |
| 1 | 10 | 90 | 90 | 65 | 75 | 85.0 | 70 | 60 | 50 | 5.0 | 5 | 15 | 10 | 20 | 5.0 | 5.0 | 5 |
| 2 | 90 | 90 | 75 | 70 | 70 | 60.0 | 70 | 60 | 50 | 30.0 | 5 | 10 | 30 | 20 | 2.0 | 10.0 | 30 |
| 3 | 95 | 90 | 85 | 70 | 85 | 90.0 | 75 | 60 | 50 | 10.0 | 2 | 20 | 5 | 15 | 1.0 | 5.0 | 12 |
| 4 | 95 | 90 | 85 | 50 | 65 | 75.0 | 50 | 65 | 50 | 65.0 | 0 | 30 | 25 | 5 | 5.0 | 10.0 | 20 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 70 | 95 | 90 | 55 | 80 | 50 | 85.0 | 60 | 80 | 50 | 5.0 | 2 | 10 | 10 | 45 | 1.0 | 15.0 | 5 |
| 71 | 95 | 90 | 80 | 60 | 60 | 75.0 | 70 | 60 | 50 | 20.0 | 20 | 30 | 30 | 10 | 5.0 | 15.0 | 15 |
| 72 | 100 | 90 | 90 | 80 | 80 | 50.0 | 80 | 60 | 50 | 5.0 | 30 | 10 | 70 | 30 | 5.0 | 40.0 | 30 |
| 73 | 75 | 10 | 90 | 50 | 75 | 50.0 | 50 | 75 | 50 | 25.0 | 0 | 25 | 25 | 5 | 5.0 | 5.0 | 30 |
| 74 | 90 | 70 | 70 | 20 | 95 | 50.0 | 65 | 35 | 50 | 30.0 | 0 | 10 | 10 | 8 | 2.0 | 5.0 | 10 |
The full Probability Words Dataset stored in a DataFrame (75 rows).
Pages Using the Probability Words Dataset
- Video Walk-Through & Worksheet: Learn Page: Calculating Quartiles and Outliers
- Video Walk-Through & Worksheet: Learn Page: Histograms, Bar Charts, and Box Plots