Overview of Machine Learning

← Bayes' Theorem Next: Correlation →

Machine Learning is a broad area of Data Science that refers to any algorithm where data is used to help predict a better outcome. There are thousands of different machine learning algorithms available that are used for everything from developing developing clothes patterns to self-driving cars.

Supervised vs. Unsupervised Learning

All machine learning algorithms can be classified into two broad categories:

Supervised Learning, algorithms that learn from data where the correct or "best" answer is provided to the algorithm. An example of this supervised learning is an algorithm that can identify if an image contains a dog or a cat, and it learns how to do this by "training" on a large set of images that are known to have a dog and a large set of images that are known to have a cat.
Unsupervised Learning, algorithms that learn from data where the correct or "best" answer is not provided. Three common categories of unsupervised learning:
- Clustering, algorithms that identify "clusters" of similar elements to both classify the cluster and identify anomalies. If you have a set of images that are known to only contains dogs and cats, an unsupervised algorithm could identify two clusters that could contain "dog images" and "cat images" (but it would not know which images had dogs vs. cats).
- Reinforcement Learning, algorithms that understand the rules of a game or environment and attempt many different combinations to maximize the algorithm's chance of success. The very best algorithms to play many games, like Chess or Go, have been developed by having the machine learn through playing against itself with no initial strategy and using only reinforcement learning to learn the best strategy.
- Component Analysis, algorithms that identify the variables (columns) in a large dataset that most significantly contribute to differences between the observations ("most discriminating"). This is useful for reducing a large dataset into a smaller dataset that is more easily able to be understood or analyzed by humans.

Classification vs. Prediction

Most machine learning algorithms can be classified into two additional categories, based on the result of the algorithm:

Classification Algorithms are algorithms that classify data into clusters or groups. All algorithms (supervised or unsupervised) that identify if an image that contains a dog or a cat are classification algorithms since they predict the category of the result.
Prediction Algorithms are algorithms that predict an exact value. For example, a machine learning algorithm used to create a weather forecast will predict an exact temperature for a given time of day.

Example Algorithms

While there are thousands of machine learning algorithms, you can understand the difference between these algorithms by understanding simple examples of each combination.