Machine learning, sometimes called ML, is a cutting edge field in computer science that seeks to get computers to carry out tasks without being explicitly programmed to carry out a given task. Machine learning uses many techniques to create algorithms to learn and make predictions from data sets. It is used in data mining which is a technique to discover patterns and models in data sets where relationships are previously unknown. Machine learning is used in search engines, optimization problems, computer vision, and more. These concepts have been applied in Google’s self-driving car and in recommendation engines on sites like Amazon and Netflix.
Machine learning algorithms work by building a model from a training set. A training set is a data set that is input into an algorithm where the correct outputs are already known. The ML algorithm builds the model as it reads the training set, reading the next input, predicting the output, then checking its prediction to the actual output, and adjusting accordingly.
Machine learning is sometimes thought of as purely data mining, but data mining is a subfield of machine learning that uses unsupervised learning. Machine learning is a key topic for computer scientists and software engineers called “data engineers.”
"A computer program is said to learn from experience \(E\) with respect to some class of tasks \(T\) and performance measure \(P\) if its performance at tasks in \(T\), as measured by \(P\), improves with experience \(E\)." - Tom M. Mitchell, computer scientist at Carnegie Mellon University.
There are two ways that computer scientists group machine learning algorithms. One way to group them is by the way that the algorithms learn. The other way to group them is by similarity in form or function (grouping similar algorithms together). For example, some algorithms are tree-based, others might be inspired by neural networks.
Supervised learning is the machine learning task of determining a function from labeled data. For example, in a machine learning algorithm that detects if a post is spam or not, the training set would include posts labeled as "spam" and posts labeled as "not spam" to help teach the algorithm how to recognize the difference. Supervised learning algorithms infer a function from labeled data and use this function on new examples. Since the algorithm receives a data set as input that already has a correct answer associated with it, the algorithm will learn by comparing its output with the correct answer, and if it finds errors, it adjusts the model accordingly. The training continues until the algorithm outputs information within the desired accuracy range.
The most widely used supervised learning algorithms are support vector machines, linear regression, logistic regression, \(k\)-Nearest-Neighbors, and neural networks. Supervised learning is commonly used in situations where historical data predicts likely future events.
Classification is a key topic in supervised learning. Classification divides inputs into classes or groups. It’s up to the algorithm to create a model that assigns new inputs to one or more of these classes. For example, the algorithm would assign some emails to the “spam” class and others to the “not spam” class. Classification can also be done with unsupervised learning techniques.
Unsupervised learning uses input data that is unlabeled. Specifically, because the data is unlabeled, there is no error or reward to let the algorithm know if it is close or far away from the proper solution. Unsupervised learning is very important when using machine learning on problems where the answer is not known. The goal of unsupervised learning is to take in data and explore it to find some structure within the data.
A model is created from unsupervised learning by deducing structures and patterns in the input data.
Unsupervised learning is used in the following methods:
Clustering is a popular unsupervised learning method used to group similar data together (in clusters). Clustering is the assignment of a set of observations into special subsets, or clusters. This way, data that have similar characteristics will be grouped together. K-means clustering is a popular way of clustering data.
Anomaly detection, otherwise known as outlier detection, is the identification of data that does not conform to the rest of the dataset. This task does not require labeled data, as long as the majority of data points in the dataset are "normal" and the algorithm looks for data points that are the least similar to the rest of the data.
Many artificial neural networks use unsupervised learning, where an algorithm must learn to reach a certain goal on unlabeled data. The fundamental theory behind unsupervised neural networks is Hebbian theory, which describes the adaptation of neurons during learning.
When choosing which algorithm to use, some considerations to keep in mind are the learning method and the training time. Depending on the type of data you have, one learning method may be better suited to the task than another. The training time of an algorithm refers to the amount of time it takes to train a model. The training time varies algorithm to algorithm. Training time is often closely related to the accuracy of the algorithm. 
Here are some common groups of machine learning algorithms:
Regression is a statistics method used to estimate relationships among variables. Usually, regression is concerned with how the typical value of the dependent variable changes when an independent variable is altered. Regression analysis estimates the conditional expectation of the dependent variable given the independent variables. In other words, it estimates the average value of the dependent variable when the independent variables are fixed.
Linear regression is a technique used to model the relationships between observed variables. The idea behind simple linear regression is to "fit" the observations of two variables into a linear relationship between them. Graphically, the task is to draw the line that is "best-fitting" or "closest" to the points \( (x_i,y_i),\) where \( x_i\) and \(y_i\) are observations of the two variables which are expected to depend linearly on each other.
Ridge regression is the most commonly used regression algorithm for determining an approximate answer for an equation with no unique solution. This type of problem is very common in machine learning tasks, where the "best" solution must be chosen using limited data. Ridge regression prevents overfitting and underfitting.
Instance-based learning, sometimes called memory-based learning, is a group of learning algorithms that compares new problem instances with instances that it has seen in training. By doing this, it constructs hypotheses directly from the training instances instead of performing explicit generalization. Instance-based learning is a type of lazy learning.
An advantage of instance-based learning is that it can adapt its model to previously unseen data. These algorithms can store a new instance or throw an old instance away according to what it learns from the data.
The \(k\)-Nearest-Neighbors algorithm is a popular instance-based algorithm. The \(k\)-Nearest-Neighbors algorithm works by categorizing an input by using its \(k\) nearest neighbors.
\(k\) nearest neighbors can be used in classification or regression machine learning tasks. Classification involves placing input points into appropriate categories whereas regression involves establishing a relationship between input points and the rest of the data. \(k\)-NN is one of many algorithms used in machine learning tasks, in fields such as computer vision and gene expression analysis.
Support vector machines are a type of kernel method which are used to perform binary classification on data. A support vector machine takes in a set of training examples, which are already labeled with their respective classes. The support vector machine algorithm then builds a model that makes predictions about where a new example input will go.
Decision Tree Algorithms
Decision tree algorithms use a decision tree as a predictive model to map observations. There are two main types of trees used. Classification trees are used for analysis to predict which class the input data belongs in. Regression trees are used in analysis where the predicted outcome is a real number, for example, to price of stocks or the number of visitors to a museum.
Bayesian algorithms apply Bayes’ theorem for problems such as classification and regression. As such, Bayesian algorithms use concepts from probability theory to make models. For example, a Bayesian network could help describe the probabilistic relationships between certain symptoms and certain diseases. It could be used to help predict how likely people are to have a given illness.
Bayesian networks are directed acyclic graphs (DAG) where the nodes represent random variables that can represent unknown parameters, observable characteristics, and latent variables. The edges in the DAG represent conditional dependencies. This means that nodes that are not connected represent variables that are conditionally independent of each other.
A common Bayesian algorithm is the naive Bayes classifier.
Artificial Neural Network Algorithms
These algorithms use artificial neural networks that are inspired by biological neural networks (structures in organisms’ brains). A neural network is essentially a large collection of connected nodes. Each node's output is determined by this operation, as well as a set of parameters that are specific to that node. By connecting these nodes together and carefully setting their parameters, very complex functions can be learned and calculated.
Deep Learning Algorithms
Deep learning algorithms are improved versions of artificial neural networks algorithms. They use multiple layers of artificial neural networks to model the way the human brain processes things like light and sound into vision and hearing. In general, deep learning algorithms are built off of unsupervised learning run on multiple levels of the data.
Deep learning is used in computer vision and speech recognition technologies. Deep learning is made possible by the large amount of computing resources that are available in the present day. Deep learning is being applied in fields such as face recognition and facial expression recognition software and voice recognition and processing software.
According to the MIT Technology Review, "a Google deep-learning system that had been shown 10 million images from YouTube videos proved almost twice as good as any previous image recognition effort at identifying objects such as cats."
There is some debate about the differences between artificial intelligence and machine learning. Artificial intelligence uses computation models, like neural networks, to replicate biological structures. Machine learning uses some of these methods (such as neural networks), but focuses more on applying statistics and probability theory.
The focus of machine learning is to make predictive systems that learn from data whereas the goal of artificial intelligence is to make intelligent systems that may or may not learn from data.
- , . Machine learning. Retrieved July 12, 2016, from https://en.wikipedia.org/wiki/Machine_learning
- Brownlee, J. A Tour of Machine Learning Algorithms. Retrieved July 12, 2016, from http://machinelearningmastery.com/a-tour-of-machine-learning-algorithms/
- , . Machine Learning. Retrieved July 12, 2016, from http://www.sas.com/en_id/insights/analytics/machine-learning.html
- Rohrer, B. How to choose algorithms for Microsoft Azure Machine Learning. Retrieved July12, 2016, from https://azure.microsoft.com/en-us/documentation/articles/machine-learning-algorithm-choice/g
- , . Regression analysis. Retrieved July 12, 2016, from https://en.wikipedia.org/wiki/Regression_analysis
- , R. File:Linear regression.svg. Retrieved July12, 2016, from https://en.wikipedia.org/wiki/File:Linear_regression.svg
- , . Kernel method. Retrieved July 12, 2016, from https://en.wikipedia.org/wiki/Kernel_method
- , . Bayesian network. Retrieved July12, 2016, from https://en.wikipedia.org/wiki/Bayesian_network
- , Z. File:Neural network bottleneck achitecture.svg. Retrieved July12, 2016, from https://commons.wikimedia.org/wiki/File:Neural_network_bottleneck_achitecture.svg
- Hof, R. Deep Learning With massive amounts of computational power, machines can now recognize objects and translate speech in real time. Artificial intelligence is finally getting smart.. Retrieved July 12, 2016, from https://www.technologyreview.com/s/513696/deep-learning/