Perception of Probability Words Dataset


The "Probability Words Dataset" is a survey of primary undergraduate students at The University of Illinois and their perception of words that describe a probability of rain on a given day.

  • Dataset Format: Well-formatted CSV with column headers as the first row
  • Dataset Size: 75 rows × 17 columns
  • CSV File Location: https://waf.cs.illinois.edu/discovery/words.csv
  • Dataset Variables:
    • Almost Certain : number ➜ The respondent's perception of the probability when they are told it is "almost certain it will rain tomorrow".
    • Highly Likely : number ➜ The respondent's perception of the probability when they are told it is "highly likely it will rain tomorrow".
    • Very Good Chance : number ➜ The respondent's perception of the probability when they are told there is a "very good chance it will rain tomorrow".
    • Probable : number ➜ The respondent's perception of the probability when they are told it is "probable it will rain tomorrow".
    • Likely : number ➜ The respondent's perception of the probability when they are told it is "likely it will rain tomorrow".
    • We Believe : number ➜ The respondent's perception of the probability when they are told that "we believe it will rain tomorrow".
    • Probably : number ➜ The respondent's perception of the probability when they are told it "probably will rain tomorrow".
    • Better than Even : number ➜ The respondent's perception of the probability when they are told it is "better than even it will rain tomorrow".
    • About Even : number ➜ The respondent's perception of the probability when they are told it is "about even it will rain tomorrow".
    • We Doubt : number ➜ The respondent's perception of the probability when they are told that "we doubt it will rain tomorrow".
    • Improbable : number ➜ The respondent's perception of the probability when they are told it is "improbable it will rain tomorrow".
    • Unlikely : number ➜ The respondent's perception of the probability when they are told it is "unlikely it will rain tomorrow".
    • Probably Not : number ➜ The respondent's perception of the probability when they are told it will "probably not rain tomorrow".
    • Little Chance : number ➜ The respondent's perception of the probability when they are told it is "little chance it will rain tomorrow".
    • Almost No Chance : number ➜ The respondent's perception of the probability when they are told it is "almost no chance it will rain tomorrow".
    • Highly Unlikely : number ➜ The respondent's perception of the probability when they are told it is "highly unlikely it will rain tomorrow".
    • Chances are Slight : number ➜ The respondent's perception of the probability when they are told that "chances are slight it will rain tomorrow".
  • Data Collection: When this dataset was presented to users, the phrases appeared in a random order as a measure to eliminate possible bias.

Using the Probability Words Dataset in Python

The dataset can be loaded using the pandas library in Python:

import pandas as pd
df = pd.read_csv("https://waf.cs.illinois.edu/discovery/words.csv")
df
Almost CertainHighly LikelyVery Good ChanceProbableLikelyWe BelieveProbablyBetter than EvenAbout EvenWe DoubtImprobableUnlikelyProbably NotLittle ChanceAlmost No ChanceHighly UnlikelyChances are Slight
0909080707070.070605020.02020202010.010.010
1109090657585.07060505.051510205.05.05
2909075707060.070605030.051030202.010.030
3959085708590.075605010.02205151.05.012
4959085506575.050655065.00302555.010.020
......................................................
70959055805085.06080505.021010451.015.05
71959080606075.070605020.0203030105.015.015
721009090808050.08060505.0301070305.040.030
73751090507550.050755025.00252555.05.030
74907070209550.065355030.00101082.05.010

The full Probability Words Dataset stored in a DataFrame (75 rows).

Pages Using the Probability Words Dataset

  1. Video Walk-Through & Worksheet: Learn Page: Calculating Quartiles and Outliers
  2. Video Walk-Through & Worksheet: Learn Page: Histograms, Bar Charts, and Box Plots