# Data Science Guides

These DISCOVERY guides are short, solution-focused examples of common tasks in Data Science. We create several new guides each week, so there is constantly something new!

## Guides Using pandas DataFrames

### DataFrame Fundamentals

**DataFrame Indexing: .loc[] vs .iloc[]**

The`loc`

and`iloc`

functions are commonly used to select certain groups of rows (and columns) of a pandas DataFrame.**Retrieve a Single Value From A DataFrame**

Common function used to retrieve single values from a DataFrame include the`at`

,`iat`

,`loc`

, and`iloc`

functions.**Working With Columns and Series in a DataFrame**

Manipulating columns — sometimes called "variables" — is a foundational data science skill.**Run a Custom Function on Every Row in a DataFrame**

We can use the`apply`

function to run a function on every row of a DataFrame.**Finding Descriptive Statistics for Columns in a DataFrame**

A great way to familiarize ourselves with all the new information is to look at descriptive statistics (sometimes known as summary statistics) for all applicable variables.**Finding Quantiles of a Column in a DataFrame**

We can find many different quantiles for sets of numbers using the`quantile`

function of a DataFrame.**Using Previous Observations when Computation Values in a DataFrame**

When you're analyzing data reported on a regular basis (ex: daily cases, monthly reports, etc), it is common to need to use the values from the previous one or more observations in your calculation. The`df.column.shift(1)`

observation reports the value for a`column`

from one observation earlier.

### Reading and Importing Data into DataFrames

**Creating a DataFrame from an Excel file using Pandas**

Many datasets are provided in an Excel file format (file extension`.xlsx`

). The`pd.read_excel`

function provides two primary ways to read an Excel file.**Creating a DataFrame from an HTML table using Pandas**

HTML tables can be found on many different websites and can contain useful data we may want to analyze.**Creating a DataFrame from a Fixed-Width File using Pandas**

Some datasets are provided in a fixed-width file format (common extension is`.txt`

, but includes many others as well).**Creating a DataFrame from a CSV file using Pandas**

Many datasets are provided in a comma-separated value file format (file extension`.csv`

). The`pd.read_csv`

function provides two primary ways to read a CSV file.

### Combining DataFrames

**Combining DataFrames by Merging**

A detailed guide with examples of combining DataFrames based on matching the contents of the data from columns, using`pd.merge`

.**Combining DataFrames by Concatenation**

Concatenation is a great way to combine DataFrames with identical columns. Concatenation does not look at the contents of the data at all and only joins the DataFrame end-to-end.**Combining DataFrames by Joining**

A brief guide to combining DataFrames together in pandas with`join`

.

### Row Selection using DataFrames

**Select Rows From A DataFrame**

There are numerous ways to select rows from a DataFrame. One method is to select rows based on the content of its columns. To do this, we can use conditions.**Finding Minimum and Maximum Values in a DataFrame Column**

It's often helpful to know a few specific values for each column (aka variable) in a DataFrame -- mainly the highest value, lowest value, and all unique values.**Slice Objects and DataFrames**

When working with data from a pandas DataFrame, oftentimes we want to select a range of cells rather than specific ones. To do this, we can use slice objects.**Selecting Rows that are IN and NOT IN a DataFrame**

The`.isin`

function can be used to select rows from a DataFrame that are or are not in another DataFrame.**Selecting DataFrame Rows Based on String Contents**

When working with text, it is often useful to select rows that contain a specific string. The .str.contains function allows us to test each row's data to determine if a specific string exists in the text.

### Modifying DataFrames

**Creating New Columns in a DataFrame**

There are two primary methods of creating new columns in a DataFrame: creating a new column calculated from data you already have or using Python to create new data.**Sorting a DataFrame Using Pandas**

The`sort_values`

method of a DataFrame is used to sort a DataFrame by the data in a column.**Removing Rows from a DataFrame**

The`drop`

function is used to remove rows or columns from a pandas DataFrame.**Removing Columns in a DataFrame**

To make the data less cluttered, you can remove a column from your DataFrame using pandas.**Handling Missing Data in Pandas**

While it would be nice if our datasets all had the values we expect, it's not always the case. Oftentimes certain cells in a DataFrame will be empty, or contain a value that we don't want.**Grouping Data by column in a DataFrame**

The`groupby`

can be used to combine rows in a DataFrame to help better analyze large DataFrames.

### Data Visualization

**Creating Simple Data Visualizations in Python using matplotlib**

The matplotlib library in Python provides an extremely simple way to create professional Data Visualizations. This guide explores the Python needed to create scatter plots, bar charts, pie charts, and line charts!

### Saving and Exporting DataFrames

**Saving a DataFrame to a CSV file using Pandas**

. An easy way to save your dataset is to export it to a CSV file that can then be shared. This can be done with the pandas`to_csv`

function.**Saving a DataFrame to an Excel file using Pandas**

An easy way to save your dataset is to export it to an Excel file that can then be shared. This can be done with the pandas`to_excel`

function.

## Guides Using Statistics

### Classic Problems

**Monty Hall Problem - Interactive Game and Three Intuitive Solutions**

An interactive version of the classic Monty Hall Problem and three intuitive solutions explained.

### Probability

**Seven Detailed Examples Using The Addition Rule**

Mathematical and Python examples of using the addition rule to calculate the probability of multiple events occurring.**Python Functions for Bernoulli and Binomial Distribution**

Using functions from the scipy.stats library to represent Bernoulli and Binomial distributions in python**Six Detailed Examples Using The Multiplication Rule**

Mathematical and Python examples of using the multiplication rule to calculate the probability of multiple events occurring.

### Statistics with Python

**Calculating Standard Deviation in Python**

When we're presented with numerical data, we often find descriptive statistics to better understand it. One of these statistics is called the standard deviation, which measures the spread of our data around the mean (average).

## Other Guides

### System Setup

**Setup Your System for Data Science**

As you begin your journey as a Data Scientist, it is important to get familiar with tools on your own system in addition to tools in your web browser.**Your System's Terminal**

Every operating system contains a Command Line Interface (CLI) that lets you interact with your computer using a keyboard known as a terminal. You can do everything you already do on a computer via the terminal, but you can also do a whole lot more!**First Time Setup for MicroProjects**

A detailed guide for getting setup to start programming MicroProjects!