Setup Your System for Data Science

As you begin your journey as a Data Scientist, it is important to get familiar with tools on your own system in addition to tools in your web browser. There are several major advantages to running code locally (on your own computer) rather than in your web browser:

  • You can use datasets on your computer (without needing to upload them online),
  • You are not restricted to the compute time or memory limits on web-based notebooks,
  • and more!

We will present small examples in class using online notebook environments in our web browsers (e.g. using Google Colab), but labs and projects will be done on your own machine. (We strongly recommend using Visual Studio (VS) Code, and this guide will set you up with VS Code, but any code editor that can edit a Python notebook can be used.)

Step 1: Install Python

Python is a free and widely used programming language and it is easy to install.

Using the Default Install

The strongly preferred way of installing Python is to install it through your operating system. To do that:

Step 1a: Open a command prompt terminal

  • Windows: In your Start Menu, search "Command Prompt" and open Command Prompt app
  • OS X: Using Finder, search "Terminal" and open the Terminal app

Step 1b: Run python3

Inside of the command prompt terminal, type python3 followed by the [Enter] key.

If you do not have python3 installed, your operating system will prompt you to install it.

  • On Windows, it will open it up in the Microsoft Store. Use it to "Get" (it's free).
  • On OS X, it will prompt you to allow OS X to install it on your system.
  • Install python3 and then repeat this step. :)

If you already have python3 installed, you will see something similar to:

Python 3.11.4 (tags/v3.11.4:d2340ef, Jun  7 2023, 05:45:37) [MSC v.1934 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> _
  • This means you now have Python installed! 🎉
  • Type exit() and press [Enter] to exit. We'll cover more Python later.

Manual Download

The default instructions provided above are strongly preferred because it leads to fewer setup errors. However, if you are unable to install python3 by running python3 from the terminal, you can manually install it:

Step 2: Install Visual Studio Code

The current best free tool for editing Python notebooks is Visual Studio Code by Microsoft. VS Code is free and open source! It is an industry-standard tool used by millions of programmers daily:

Step 3: Installing the Python and Jupyter Plugins in Visual Studio Code

Once you have installed VS Code, you will need the python and jupyter plugin to run Python notebooks. You may have to click through some set up options if you are opening VS Code for the first time. Both of these plugins are built by Microsoft and are also free and open-source:

  • In Visual Studio Code, find the left panel and select the icon with multiple square boxes that is labeled "Extensions" if you hover over it.
Extension button found in Visual Studio Code (found on the left side of VS Code interface)
  • In the "Extensions" window, search for python and find the Python plugin by Microsoft and install it:
Python Extension for Visual Studio Code
  • In the same "Extensions" window, search for jupyter and find the Python plugin by Microsoft and install it as well:
Jupyter Extension for Visual Studio Code

Confirming Configuration

Once you have the python and jupyter extensions installed, you will now confirm that everything is set up and ready to do data science!

  • In Visual Studio Code, press Ctrl (Cmd on a Mac) + Shift + P to open the "command palette". Start to type jupyter and then select Jupyter: Create New Blank Notebook.

  • In your notebook, copy the following code:

    3 + 4
  • Press Shift + Enter to run the cell (or press the run button).

    • If you are asked to select a "kernel" or "runtime environment", select the largest version of Python (ex: python3.11).
    • You may be prompted to install ipykernel or other Python libraries. Allow Visual Studio code to install them.
    • ✔️ If you see the answer 7, Visual Studio Code is all set for running Python!

Step 4: Installing pandas

Finally, you'll need to install the pandas library to do data science!

  • Inside of Visual Studio Code, press Ctrl + ~ to open the integrated terminal.
  • In the terminal, type the following:
    • python3 -m pip install pandas
    • If the above fails, try: pip3 install pandas
    • If the above also fails, try pip install pandas
    • If all the above fails, try python -m pip install pandas
    • Finally, if all the above fails, try py -m pip install pandas

You can do the following to confirm that the pandas library was properly installed:

  • In your same notebook, copy the following code:

    import pandas as pd
  • Press Shift + Enter to run the cell (or press the run button). If you're asked to choose a kernel, choose the latest version (ex: 3.11).

    • ✔️ If you see no error message, pandas is successfully installed!