Chain rule is a method (the only method???) for differentiating a composite function.

A composite function is a function whose input is the output of another function, e.g. . It can be nested arbitrarily deep, e.g. .

If depends on and depends on , and you know:

  • rate of wrt
  • rate of wrt

Then you also know:

  • rate of wrt to

Heres the rule:

The following image might be helpful also:

Chain rule is used in back propogation in neural network training (training = finding weights and biases of the neural network).

Back propogation refers to finding weights/biases for the last layer of the neural network first, and then working backwards from there.

Since the weights/biases for each layer is a function of the outputs of the layer directly upstream from it, you can use chain rule to start from the most down stream layer and work backwards to find all the weights/biases.