Exploring COVID-19 Data from GitHub


MicroProject Overview

Since before COVID-19 was detected in the United States, the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University has provided daily updates of COVID-19 case data as clean, structured CSV files on GitHub as a free public service to the world.

In this MicroProject, you will explore how to find a dataset on GitHub and use that for Data Science analysis! At the end of the MicroProject, you will check if the Pareto Principle applies to confirmed cases of COVID-19. Let's nerd out!

Data Science Skills

In this microproject, you will explore COVID-19 data from Johns Hopkins University by finding the raw CSV data on GitHub and strengthen the following Data Science skills:

  • Importing data directly from GitHub with pd.read_csv
  • Grouping unique variables together using df.groupby
  • Finding the cumulative sum of data using df.cumsum
  • Checking if the Pareto Principle applies to confirmed cases of COVID-19

Let's nerd out!

MicroProject in Visual Studio Code
MicroProject in Visual Studio Code
JHU's CSSE's GitHub Data on COVID-19
JHU's CSSE's GitHub Data on COVID-19
DataFrame loaded with Data from COVID-19 data from GitHub
DataFrame loaded with Data from COVID-19 data from GitHub
Analysis of the Pareto Principle
Analysis of the Pareto Principle

First Time Doing a MicroProject?

Each MicroProject starts with a notebook that we provide to you to get started! You will need to configure a git repository to connect to our `microprojects` remote where we release the starter notebook.


Fetch the Initial Files

In your terminal, navigate to your GitHub repository and merge the initial files by running the following commands:

git fetch microprojects
git merge microprojects/microproject-covid-data-from-github --allow-unrelated-histories -m "Merging initial files"

Complete the Notebook

If the commands above were successful, you have merged in the initial files to start on the MicroProject.

  • Find the new microproject-covid-data-from-github folder.
  • Open microproject-covid-data-from-github.ipynb and complete the MicroProject!

Commit and Grade Your Notebook

Once you have finished your notebook, you must use the built-in GitHub Action to preform automated grading of your MicroProject notebook! You will need to commit your work and then manually run the GitHub Action.

Commit Your Work

To commit your notebook, run the standard git commands in your terminal:

git add -u
git commit -m "microproject completed"
git push

Grade Your Notebook

To grade your notebook, you will need to visit your GitHub repository in your browser.

  • Visit your GitHub repository in your browser
  • Click on the "Actions" tab
  • Under "Workflows", find the workflow for this microproject
  • Click the "Run Workflow" in the blue box, and then the green "Run Workflow"
  • After about 10 seconds, you should see a new job that has started running
    • You can click on the job to watch it run in real-time
    • It will take ~1 minute to run and grade
  • Once the running is complete, the autograding summary will be available!