Sometimes the simple questions can be revelatory and make one think about the possibilities we have in front of us to improve existing processes, systems and methods.

On this occasion, it was a simple question on Quora about gradient descent and the ubiquitous backpropagation algorithm used in neural networks and deep learning. The content of my answer follows.

The process of computing weights and biases which minimize the error from a neural network could be any optimization algorithm which is good enough for the job. Gradient descent (especially the versions of the algorithm which use momentum and RMS propagation) are especially effective and have well implemented matrix algebra formulations in languages like C and Python, which makes them used often. Equally though, a genetic algorithm or simulated annealing algorithm (which are more complex and computationally intensive) may be used for finding such weights and biases on each iteration. Indeed, such methods have been and are being researched extensively.

Backpropagation is defined by four equations that help calculate new weights and biases to update a neural network.

The first of these equations helps calculate the error at the output layer. The second helps calculate the error in a given layer based on the error in the next layer. The third and fourth equations help calculate the rate of change of the loss function C with variations in the weights and biases.

Therefore the algorithm itself can be written out as follows :

We then use the gradient of the cost function to compute the new values of w and b, based on things like the learning rate and regularization parameters as applicable.

Why gradient descent? Since the process of backpropagation is iterative (we go from steps 1 – 5 and back again), for each update, we can get better and better versions of the weights w and biases b that are able to reduce the error between the target and the result produced by the network. The following animation (source: Data Blog) probably gives you an idea (the red areas are higher values of Cost, and blue means lower values).