Setup Your System for Data Science


As you begin your journey as a Data Scientist, it is important to get familiar with tools on your own system in addition to tools in your web browser. There are several major advantages to running code locally (on your own computer) rather than in your web browser:

  • You can use datasets on your computer (without needing to upload them online),
  • You are not restricted to the compute time or memory limits on web-based notebooks,
  • and more!

We will present small examples in class using online notebook environments in our web browsers (e.g. using Google Colab), but labs and projects will be done on your own machine. (We strongly recommend using Visual Studio (VS) Code, and this guide will set you up with VS Code, but any code editor that can edit a Python notebook can be used.)

Step 1: Ensure your Operating System is Up to Date

Throughout this semester, we will use Python packages. Similar to apps on your phone, Python packages only offer support for the latest versions of Windows and Mac OS X. It's important you make sure your version of your operating system is up to date.

  • macOS: On the top-left of your screen, click the "Apple Icon" and then click "About This Mac".

    • In the "About The Mac" window, look for the line that starts with "macOS".
    • As of Fall 2024, the latest version of macOS is Sonoma 14. If you do not have Sonoma 14 (or a larger number), you will need to update your Mac to Sonoma. Follow Apple's guide to update macOS
  • Windows: Any version of Windows 10 or Windows 11 are supported by Python.

Step 2: Install Python

Python is a free and widely used programming language and it is easy to install.

The strongly preferred way of installing Python is to install it through your operating system. To do that:

Step 2a: Open a command prompt terminal

  • macOS: Using Finder, search "Terminal" and open the Terminal app
  • Windows: In your Start Menu, search "Command Prompt" and open Command Prompt app

Step 1b: Run python3

Inside of the command prompt terminal, type python3 followed by the [Enter] key.

If you do not have python3 installed, your operating system will prompt you to install it.

  • On macOS, it will prompt you to allow OS X to install it on your system.
  • On Windows, it will open it up in the Microsoft Store. Use it to "Get" (it's free).
  • Install python3 and then repeat this step. :)

If you already have python3 installed, you will see something similar to:

Python 3.12.5 (tags/v3.12.5:d2340ef, Aug  6 2024, 05:45:37) [MSC v.1934 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> _
  • This means you now have Python installed! 🎉
  • Type exit() and press [Enter] to exit. We'll cover more Python later.

Alternative: Manual Download

The default instructions provided above are strongly preferred because it leads to fewer setup errors. However, if you are unable to install python3 by running python3 from the terminal, you can manually install it:

Step 3: Install Visual Studio Code

The most widely used free tool for editing Python notebooks is Visual Studio Code by Microsoft. Visual Studio Code (or simply just "VS Code") is free and open source! It is an industry-standard tool used by millions of programmers daily:

Step 4: Installing the Python and Jupyter Plugins in Visual Studio Code

Once you have installed VS Code, you will need the python and jupyter plugin to run Python notebooks. Both of these plugins are built by Microsoft and are also free and open-source.

  • Launch VS Code (you may have to click through some set up options if you are opening VS Code for the first time).

  • In Visual Studio Code, find the left panel and select the icon with multiple square boxes that is labeled "Extensions" if you hover over it.

Extension button found in Visual Studio Code (found on the left side of VS Code interface)
  • In the "Extensions" window, search for python and find the Python plugin by Microsoft and install it:
Python Extension for Visual Studio Code
  • In the same "Extensions" window, search for jupyter and find the Python plugin by Microsoft and install it as well:
Jupyter Extension for Visual Studio Code

Confirming Configuration

Once you have the python and jupyter extensions installed, you will now confirm that everything is set up and ready to do data science!

  • In Visual Studio Code, press Ctrl (Cmd on a Mac) + Shift + P to open the "command palette". Start to type jupyter and then select Jupyter: Create New Blank Notebook.

  • In your notebook, copy the following code:

    3 + 4
  • Press Shift + Enter to run the cell (or press the run button).

    • If you are asked to select a "kernel" or "runtime environment", select the version of Python with the largest number (ex: python3.12).
    • You may be prompted to install ipykernel or other Python libraries. Allow Visual Studio code to install them.
    • ✔️ If you see the answer 7, Visual Studio Code is all set for running Python!

Step 5: Installing pandas

Finally, you'll need to install the pandas library to do data science!

  • Inside of Visual Studio Code, press Ctrl + ~ to open the integrated terminal.
  • In the terminal, type the following:
    • python3 -m pip install pandas
    • If the above fails, try: pip3 install pandas
    • If the above also fails, try pip install pandas
    • If all the above fails, try python -m pip install pandas
    • Finally, if all the above fails, try py -m pip install pandas

You can do the following to confirm that the pandas library was properly installed:

  • In your same notebook, copy the following code:

    import pandas as pd
  • Press Shift + Enter to run the cell (or press the run button). If you're asked to choose a kernel, choose the latest version (ex: 3.12).

    • ✔️ If you see no error message, pandas is successfully installed!