The Chain Rule: Derivatives of Nested Functions
If f wraps around g, the rate of change of the whole is the rate of f times the rate of g. The chain rule is this simple multiplication — and it's the engine behind every neural network training algorithm.