Deadline Extended
The deadline for lab_intro is extended until Jan. 31 to allow for everyone to get their computer set up
and for anyone who added late to complete this lab.
The first half of this lab will be spent getting you all set up for the semester – you will only need to do this once. :)
To begin to do Data Science, you need a few basic tools installed on your computer. All of these tools are free, open-source and industry standard. We have prepared guides based on what type of computer you have:
Python 3
If you find you only have Python 2.x, and not Python 3.x, you can install Python 3 form python.org here. You will need at least Python 3.7, and you should grab the latest.
Part 1b: Creating your STAT 107 git
repository
When working in Data Science, you will want to store all of your code and data together, in the cloud, in a “repository”. For Discovery, we will be using an Illinois-hosted repository called GitHub Enterprise.
Part 1c: Set up your Python notebook
In Data Science, all of our programming will be done in “notebooks”. Your python install will need a few libraries
in order to run the notebooks. Using your command line (either by going to your system’s command line interface or pressing Ctrl
+ `
is Visual Studio Code), run the following:
py -m pip install pandas
This will take a bit. You will need to press [Enter] to confirm you want to install of of the packages (the option [y]/n shows that y
is default when you choose no option).
Other Commands
If the command above tells you that py
is not found, you can try these alternatives:
python3 -m pip install pandas
(particularly on Macs/OS X)
pip install pandas
(particularly if you have installed Python before this course)
python -m pip install pandas
(has worked for a few people when nothing else has worked)
Part 2: Complete the “Lab: Introduction” Notebook
Using your command line, navigate to your stat107
repository (cd Desktop
-> cd stat107
-> cd [NETID]
, unless you’re already there) and fetch the notebook from our release repository by running the following two git
commands:
git fetch release
git merge release/lab_intro -m "Merging initial files"
ONLY IF you get an error related to unrelated histories, use:
git merge release/lab_intro --allow-unrelated-histories -m "Merging initial files"
Open the lab_intro
folder inside of Visual Studio Code by going int Visual Studio Code and choosing File -> Open Folder, then:
- Open up the
lab_intro.ipynb
notebook
- Follow the instructions inside of the notebook