To work with artificial neural networks, it's important to have a grasp of the basics of vectors, matrices, and optimization. These methods are helpful for mathematical convenience, and they are also critical for actually coding a performant neural network.

For example, graphics processing units are highly optimized to perform parallel matrix operations. Although this requires converting data, models, and algorithms into matrix form, the time taken to do so yields worthwhile practical returns in model training time.

We'll also touch on **gradient descent** is a heuristic method that starts at a random point and iteratively moves in the direction that decreases the function that we want to minimize. Colloquially, think of it as playing a game of “hot” and “cold”, until the improvement becomes negligible.