×
Back to all chapters

# Artificial Neural Networks

A quick dive into a cutting-edge computational method for learning.

# Computational Models of The Neuron

A neuron has many inputs but only one output, so it must "integrate" its inputs into one output (a single number). Recall that the inputs to a neuron are generally outputs from other neurons. What is the most natural way to represent these inputs to a single neuron in an ANN?

In our computational model of a neuron, the inputs defined by the vector $$\vec{x}$$ are “integrated” by taking the bias $$b$$ plus the dot product of the inputs $$\vec{x}$$ and weights $$\vec{w}.$$ $\vec{w} \cdot \vec{x} + b$ The dot product represents a "weighted sum" because it multiplies each input by a weight.

A biological interpretation is that the inputs defining $$\vec{x}$$ are the outputs of other neurons, the weights defining $$\vec{w}$$ are the strengths of the connections to those neurons, and the bias $$b$$ impacts the threshold the computing neuron must surpass in order to fire.

Given the inputs, weights, and bias shown above, what is the integration of these inputs according to the weighted sum $$\vec{w} \cdot \vec{x} + b?$$

An activation function, $$H(v),$$ is used to transform the integration (weighted sum) into a single output which determines whether or not the neuron would fire. For example, we might have $$H(v)$$ as the Heaviside step function; that is, $H(v) = \begin{cases} 1 & \mbox{if } v \ge 0, \\ 0 & \mbox{if } v \lt 0. \\ \end{cases}$ Considering $$H(\vec{w} \cdot \vec{x} + b),$$ how does increasing the bias $$b$$ affect the likelihood of the neuron firing (all else equal), assuming that a 1 corresponds to firing?

When $$H(v)$$ is the Heaviside step function, the neuron modeled by $$H(\vec{w} \cdot \vec{x} + b)$$ fires when $$\vec{w} \cdot \vec{x} + b\ge 0.$$

The hypersurface $$\vec{w} \cdot \vec{x} + b = 0$$ is called the decision boundary, since it divides the input vector space into two parts based on whether the input would cause the neuron to fire. This model is known as a linear classifier because this boundary is based on a linear combination of the inputs.

The model above shows a decision boundary for predicting college admission based on the input $\vec{x} = \begin{pmatrix}\text{SAT score} \\ \text{GPA} \end{pmatrix}$ and the activation function $$H(\vec{w} \cdot \vec{x} + b)$$, where $$H(v)$$ is the Heaviside step function. Which of the following is a possible value for the weight vector, $$\vec{w}?$$

So far, we’ve considered an activation function $$H(v)$$ with binary outputs, as inspired by a physical neuron. However, in ANNs, we don’t need to restrict ourselves to a binary function. Functions like the ones below avoid counterintuitive jumps and can model continuous values (e.g., a probability).

The power of ANNs is illustrated by the universal approximation theorem: ANNs using activation functions like these can model any continuous function, given some general requirements about the size and layout of the ANN.

Consider the activation function $$H(v) = \dfrac{1}{1+e^{-v}},$$ which is known as the sigmoid function. Given the inputs, weights, and bias shown above (which are the same as in an earlier question), what is the approximate output from this neuron (to two decimal places) after the integrated value of the inputs is evaluated by the activation function?

We’ve now built up a basic computational model of neurons. While one neuron might not seem powerful, connecting many together in a clever manner can yield a highly effective learning model. This turns out to be true for ANNs, as evidenced by the universal approximation theorem.

The remainder of this course focuses on the methods used to construct and train ANNs, highlighting the intuition behind the models and their applications. Let’s dive in!

×