Annealing
Simulated Annealing is a heuristic technique that is used to find the global optimal solution to a function. It is a probabilistic technique, similar to a MonteCarlo method. In fact, simluated annealing was adapted from the MetropolisHastings algorithm, a MonteCarlo method.
Other techniques, such as hill climbing, gradient descent, or a bruteforce search are used when finding a local maxima/minima is more important than finding a global maxima/minima. These techniques are faster than simulated annealing, but they don't guarantee that their results are globally optimal.
In the below graphic, the simulated annealing algorithm is used to find the global maximum for the graph. It chooses a random value and calculates its cost. Then, it chooses a neighboring solution and calculates its cost. If the new cost is better, the new solution is chosen, and even if its not better, the new solution is sometimes chosen. It stops when the desired cost is found.
Simulated annealing is a good probabilistic technique because it does not accidentally think a local extrema is a globsl extrema. So, it is especially useful when the search space has many local maximas/minimas and when the search function is nonlinear. Simulated annealing is used in many algorithmrelated problems such as the set cover problem, the maximum cut problem, or the independent set problem. It is also used to solve design problems such as water transportation and deisel engine design^{[2]}.
Overview of Annealing
Annealing is a heuristic. Heuristics are rules of thumb that can be used to find a good answer to a question. They are often used to solve large, nonlinear and nonconvex problems. Heuristics won't guarantee an optimal answer, but they can get very close and are often a good trade off between precision and computation. They often incorporate randomization. Annealing introduces just the right amount of randomness and probability to help it escape those local optimums.
There are two kinds of heuristics: construction and improvement. Construction heuristics find a feasible solution and improve upon it. Improvement heuristics start with some feasible solution and only try to improve it. Annealing is an improvement heuristic.
Annealing was made to mirror the cooling of atoms to a state of minimum energy. There is an analogy to cooling a system and bringing it to its lowest energy state and finding a global optimum for that system. If a liquid cools and anneals too quickly, then the material will solidify into a suboptimal configuration. However, if it is cooled slowly, then the crystals inside will solidify optimally into a state of minimum energy. This state of minimum energy is analogous to the minimum cost of the function being analyzed by simulated annealing.
Annealing was introduced in 1983 as a method of solving combinatorial optimization by applying statistical mechanics.
Algorithm
Annealing requires a few things to get started:
Annealing uses the following elements:
 Cost Function  this function is what is being minimized by annealing.
 Search Space  the space of possible solutions.
 Temperature  the "temperature" of the system, a function of which iteration the annealing process is on.
For example, take the traveling salesperson problem. In this problem, the goal is to find a path that a salesperson can take through various cities, visiting each city once, and returning to their original city. The cost function of a given tour (a path that ends where it began) will just be the length of that tour. The search space will be the set of all possible tours. The temperature of this process is up to the programmer, but it usually starts at 1 and decays, following some sort of cooling schedule. A typical cooling schedule might just multiply the temperature by some $\alpha$ (typically .99 or .8). The annealing process stops when the temperature reaches a certain point.
The annealing algorithm works as follows:
 Pick an initial, random solution, $x_0$.
 Calculate the cost of your solution, $c_{x_0}$.
 Pick a new solution that neighbors your current solution, $x_1$.
 Calculate the cost of the new solution, $c_{x_1}$.
 Compare the cost of your current and new solutions. If $c_{x_1} \lt c_{x_0}$, move to the new solution. Otherwise, move to the new solution with probability $\beta$.
 Update system temperature.
 Repeat steps 3  6 until the desired temperature is reached.
In relation to the gif in the introduction (where the highest point is being searched for), this is what is happening:
 A random point on the graph is found $x$.
 The "cost" of this point is something that is the inverse of its height, potentially $\frac{1}{x}$.
 Pick a random solution on either side of $x$, $x_1$.
 Calculate the cost of this new solution.
 If $\frac{1}{x_1} \lt \frac{1}{x}$, move to $x_1$. If not, move to $x_1$ with probability $\beta$.
 Update system temperature. In this case, it lowers temperature and at the same time lowers $\beta$. By lowering $\beta$, it decreases the chance that it chooses a suboptimal solution as time goes on.
 Repeat
The main appeal of this procedure is that in step 5, there's a chance that you might actually take the suboptimal choice between the two choices. This allows annealing to avoid potential pitfalls in local optimums. $\beta$, the chance that this process chooses the new solution suboptimally, is a function of the difference in cost of the two potential solutions and of the current system temperature.
In the case of the traveling salesperson problem, if a new path is much shorter than the other, annealing will favor switching to that and $\beta$ will be high. However, if they are similar, there's a more even chance for either. Furthermore, as the annealing process continues (and the temperature drops), annealing should be more focused on following the solution of lower cost, not of escaping local optimums  so, $\beta$ increases.
The idea of a neighboring solution is important, but what that means will vary from solution space to solution space. In the case of the traveling salesperson problem, there are many ways to define two neighboring solutions. For example, two neighboring solutions might be two paths that have the smallest possible section reversed. A poorly devised neighboring system (such as a random configuration) in the solution space can aversely affect the outcome of the annealing process. However, this can be mitigated by more computation.
This general process is very basic, and steps are often modified to achieve best results. For example, a slower cooling schedule for the temperature can be used that causes the annealing process to take longer. However, this will give it more time to escape local maximas and minimas. Furthermore, productionlevel implementations of annealing often compare multiple pairs of neighbors in every iteration.
Pseudocode
The following pseudocode is a more computer science oriented approach to understanding this algorithm.
1 2 3 4 5 6 7 8 9 10 11 12 

The general process of this algorithm is finding neighbors from the current state to move to. In general, optimal neighbors are chosen but sometimes less optimal neighbors are chosen to avoid local maximums and minimums. Here, beta is a boolean term that is chosen by the relationship between the two cost functions of the two candidate solutions, and T. It can be, however, whatever the programmer needs to make the if
statement on line 10
work
Python Implementation
The following code is a quick implementation of annealing. In this example, a set of possible solutions is generated and then, through annealing, the optimal solution is found.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 

What is the optimal solution for this annealing process. Try to figure it out from the code before you run it or check the answer.
The optimal answer is
401
. If you look inside thecost()
function, it's checking to see how close the candidate solution is to the number 401.
Applications
Annealing can be used in any optimization scenario. If the global optimum is not the goal, there are faster techniques, such as the hill climbing algorithm. However, when a global optimum is needed, and especially when the input signal is nonlinear with many local maximas/minimas, annealing is useful because it will do the best job of getting close to the optimum.
Annealing can be used to find an optimal solution for graphrelated problems such as the set cover problem, the maximum cut problem, or the independent set problem. Set cover, for example, can use many heuristics. Simulated annealing will produce a better output than a greedy heuristic, if that is important to an application. Coupling annealing with recursive backtracking will guarantee an optimal result, though it will be more computationally expensive.
Annealing is also frequently used in system design. Often, a system has a set of requirements and limit on cost. Annealing can help find the maximum value for a given cost limit. This is very similar to the knapsack problem in computer science. For example, a building might have a certain budget, but it still needs to reach a certain level of structural support. Annealing can find the global maximum for the related function.
References
 13, K. Simulated Annealing. Retrieved June 14, 2016, from https://en.wikipedia.org/wiki/Simulatedannealing
 Graff, . System Modeling, Analysis, and Optimization for Diesel Exhaust Aftertreatment Technologies. Retrieved June 28, 2016, from http://strategic.mit.edu/docs/SM21GraffC2006.pdf
 Weck, . Simulated Annealing. Retrieved June 28, 2016, from http://ocw.mit.edu/courses/engineeringsystemsdivision/esd77multidisciplinarysystemdesignoptimizationspring2010/lecturenotes/MITESD_77S10_lec10.pdf