Final Project: You and Data Science
Due: Last Day of Lecture (Wednesday, May 4th at 11:59pm)
Throughout this semester, you have grown into an amazing Data Scientist! You are analyzing datasets in Python, performing advanced statistical tests, and finding the answers to complex questions using data. You have seen dozens of datasets we have provided. For the final project, we want you to teach us something – we want to learn about something you are passionate about!
For this final project in Data Science Discovery, you will use Data Science to explore something you are passionate about or interested in learning more about. At the end, you will write a small paper telling us about what you found and teaching us something! We only have a few minimal requirements:
- You must use a non-trivial dataset. The dataset must have at least 200 data points (this could be 20 rows with 10 columns, 50 rows with 4 columns, etc).
- You must do some analysis using Python. You will turn in your code. You must do something, but it could be anything.
- You must submit a paper/report that provides a summary of what you found and teach us about your passion/interest. The paper must be at least 1 page (and single-spaced), but up to half of that page can be figures/graphs. Full details below.
With students from so many different majors in Data Science Discovery, we are excited for everything we are going to learn from you! :)
Setting Up Your Project Workspace
To complete this project, there is no starter code – you are building it from scratch! However, we do want to check out your work so we need you to place it in a specific spot in your stat107
directory so you can turn it in and so that we can find it:
- In your
stat107
directory, navigate into the folder that contains all of your labs, extra credit notebook, etc. - Create a new directory called
project2
. - Complete all of your project work within that new
project2
directory. You’ll turn in (commit
+push
) your project2 directory and we can check out your code that way!
Dataset
Our hope is that you will use a dataset you are passionate about. It can be anything – it can be a dataset used from another class (eg: think if you had any data you get in Excel), it can be a dataset you found online, or it can be a dataset you gather yourself. Some ideas include:
- A dataset about a hobby you’re interested in (eg: vacation destinations, best beaches, fashion trends, instagram, etc)
- A dataset about something you enjoy doing or watching (eg: swimming, volleyball, Rocket League, Illini Football, etc)
- A dataset about your a topic related to your major (economics, communications, political science, etc)
- Any dataset that means something to you.
Online Data Sources
The best data is data that you personally care about, which may be data from a club you’re part of our data about something you’re passionate about that you already have available.
If you have no datasets at all, here are several websites that many people use as sources for datasets:
- kaggle.com is a well-known, free data science resource that contains millions of datasets.
- sports-reference.com contains data about all major professional sports (including MLB, NBA, NFL, NHL, NCAAF, NCAABB, Soccer/Football); fangraphs.com also allows you to query custom subsets of MLB data.
- https://oracleselixir.com/tools/downloads contains data about professional-level League of Legends matches.
- …if you have other data sources, let us know and we’ll add them to this list for everyone! :)
Project Report
The major deliverable for this project is a small paper or report over what you found. We want to learn something from you about your interest/passion, so tell us a story about what you discovered!
The only requirements are:
- Your report must be at least one page. (It can be more, use enough space to tell us what amazing things you found.)
- Your report must be single spaced. (The default settings on Word or Google Docs is great, using line spacing of up to 1.15; the real-world is not double-spaced.)
- Your font size should not be greater than 12. (The default settings on most applications is 11, which seems great.)
- Feel free to include images, diagrams, figures, etc! The only requirement is that we want at least half a page of text in your report (you can have 3 pages of diagrams so long as there’s at least half a page of text somewhere in it all.)
- Your submission must be in PDF format.
Your audience is going to be Prof. Wade, Prof. Karle, and/or your lab TA. You do not need to explain Python or Data Science to us, but you should not assume we know anything about your specific interest/passion.
Submission
When you are ready to submit, there are two things you will submit.
Submission: Part 1 - Dataset and Code
For your code, you will turn in your project2
folder just like you have done for all of your other projects.
git add -A
git commit -m "submission (or any message here)"
git push
Submission: Part 2 - Project Report
Your project report will be submitted online by the last day of class at 11:59pm!
- Upload your report to Canvas under the “Project 2” assignment.
- This URL should directly take you to the assignment: https://canvas.illinois.edu/courses/17847/assignments/451078
We can’t wait to read your project! :)