There are many types of machine learning systems. Some are based on supervision while others are based on capacity to learn incrementally. Others compare data points versus pattern analysis to develop a predictive model. Ultimately, there are four major categories: supervised, unsupervised, semisupervised, and reinforcement learning.
Supervised learning is when you add labels to training data for an algorithm. Classification is a typical use case for supervised learning. Another task is regression where data is given a set of features called predictors. In machine learning an attribute is a data type while a feature is typically the attribute plus its value.
Logistic Regression is a commonly used algorithm for classification. Other supervised learning algorithms include: k-Nearest Neighbors, Linear Regression, Support Vector Machines (SVMs), Decision Trees, and Random Forests, and Neural Networks.
With unsupervised learning the training data is unlabeled and the system learns from that data without supervision.
Unsupervised algorithms can be grouped into three categories: Clustering (k-Means, Hierarchical Cluster Analysis (HCA), and Expectation Maximization), Visualization and dimensionality reduction (Principal Component Analysis (PCA), Kernal PCA, Locally-Linear Embedding (LLE), t-distributed Stochastic Neighbor Embetting (t-SNE)), and Association rule learning (Apriori and Eclat).
Visualization algorithms in the form of 2D and 3D plotted representations of data can also be unsupervised and are outputs of unlabeld and complex data.
Dimensionality reduction has the goal to simplify data without losing information. This is accomplished by merging several correlating features into one.
Another important feature of machine learning is anomaly detection which entails removing outliers from the dataset before feeding it into the learning algorithm.
Additionally, association rule learning entails digging into large amounts of data to discover relationships between the data.
Semisupervised learning deals with data that is partially labeled and unlabeled.
Deep belief networks (DBNs) are based on unsupervised components called restricted Boltzmann machines (RBMs) stacked ontop of one another.
With reinforcement learning the agent observes the environment, select and performs actions, and gets rewards or penalties depending on the results of actions taken. From there it determines the best set of steps to eventually make decisions in any given situations.
Online and Batch Learning
With batch learning the system cannot learn incrementally and must be trained on all available data. It takes a lot of time and once it is done then it can execute tasks. This is called offline learning.
With online learning you can train the system incrementally. The data can be analyzed individually or by small groups called mini batches. The term online is a bit of a misnomer – think of the learning more as incremental learning.
It is important to determine the learning rate for how quickly the online systems adapt to the data.
Obviously if a system takes in bad data then its performance will decline.
Instance Based Versus Model Based Learning
Another way to categorize Machine Learning systems is how they generalize. Since most tasks are around predictions, this means that given a set of training examples how does the system generalize the data it hasn’t seen before.
Instance based learning is where a system memorizes examples then determines a level of similarity between the instances.
Model based learning on the other hand generalizes based on a model that is built from an example set.
A linear regression model can be used for example to determine the best fit line within a given data set.