Introduction to Data Science

Data scientists use powerful tools to help learn about data. In this first lab, you will set up your account and computer for Data Science Discovery and begin to play around with your very first Python notebook!

Source Branch: lab_intro
Due Date: Committed and pushed to git before August 30, 2021 at 11:59pm

Part 1: Software and Tools for Data Science

The first half of this lab will be spent getting you all set up for the semester – you will only need to do this once. :)

Part 1a: Installing Software Tools

To begin to do Data Science, you need a few basic tools installed on your computer. All of these tools are free, open-source and industry standard. We have prepared guides based on what type of computer you have:

Python 3

If you find you only have Python 2.x, and not Python 3.x, you can install Python 3 form python.org here. You will need at least Python 3.7, and you should grab the latest.

Part 1b: Creating your STAT 107 git repository

When working in Data Science, you will want to store all of your code and data together, in the cloud, in a “repository”. For Discovery, we will be using an Illinois-hosted repository called GitHub Enterprise.

Part 1c: Set up your Python notebook

In Data Science, all of our programming will be done in “notebooks”. Your python install will need a few libraries in order to run the notebooks. Using your command line (either by going to your system’s command line interface or pressing Ctrl + ` is Visual Studio Code), run the following:

py -m pip install pandas

This will take a bit. You will need to press [Enter] to confirm you want to install of of the packages (the option [y]/n shows that y is default when you choose no option).

Other Commands

If the command above tells you that py is not found, you can try these alternatives:

  • python3 -m pip install pandas (particularly on Macs/OS X)
  • pip install pandas (particularly if you have installed Python before this course)
  • python -m pip install pandas (has worked for a few people when nothing else has worked)

Part 2: Complete the “Lab: Introduction” Notebook

Using your command line, navigate to your stat107 repository (cd Desktop -> cd stat107 -> cd [NETID], unless you’re already there) and fetch the notebook from our release repository by running the following two git commands:

git fetch release
git merge release/lab_intro -m "Merging initial files"

ONLY IF you get an error related to unrelated histories, use:

git merge release/lab_intro --allow-unrelated-histories -m "Merging initial files" 

Open the lab_intro folder inside of Visual Studio Code by going int Visual Studio Code and choosing File -> Open Folder, then:

  • Open up the lab_intro.ipynb notebook
  • Follow the instructions inside of the notebook

Submitting Your Work

When you have completed working, you should always submit your work (even if you're not quite finished). We will always grade the latest push you made before the due date (and ignore everything else) — submitting multiple times is okay and encouraged!

Inside of Visual Studio Code:

  • Click File -> Save All to ensure your notebook is saved.

Then, press Ctrl + ` to open a terminal inside of Visual Studio Code and run:

git add -A
git commit -m "submission (or any message here)"
git push

You can verify your submission was made by visiting the web interface to github: