← Back to Parameter Optimization

Gradient Descent: The Deep Dive

To understand Gradient Descent, imagine you are standing at the top of a mountain (the point of maximum error) and you want to get to the very bottom of the valley (the point of minimum error).

The catch? You are blindfolded. You can only feel the slope of the ground under your feet to decide which way to move.

The Step-by-Step Process

Step 1: Initialize (The Starting Point)

You start by picking a random value for your parameters, $\theta$ (the weights and bias). At this stage, your model is essentially guessing blindly.

The Result: Your cost function will likely be very high because the "guess" is probably wrong.

Step 2: Calculate the Slope (The Gradient)

You feel the ground around you to see which way is "downhill." Mathematically, this is called taking the Derivative of the cost function.

The "Gradient" is just a fancy word for the slope.

If the slope is positive, you move backward.
If the slope is negative, you move forward.

Step 3: Take a Step (The Learning Rate)

Once you know the direction, you take a step. But how big should that step be? This is determined by the Learning Rate ($\alpha$).

Small $\alpha$: You take tiny baby steps. It’s very safe but will take a long time to reach the bottom.
Large $\alpha$: You take giant leaps. You might move faster, but you risk overshooting the bottom and "jumping" to the other side of the valley.

Step 4: Update the Parameters

After your step, you update your position. The formula for this update is:

$$ \theta_{new} = \theta_{old} - \alpha \cdot (\text{Slope}) $$

(We subtract because we want to go down the slope, not up).

Step 5: Repeat (The Iterations)

You repeat Steps 2 through 4 thousands of times. With every step, your cost function value gets smaller, and your "prediction" gets closer to the "actual" value.

Step 6: Convergence (The Bottom)

Eventually, the ground becomes flat. The slope becomes zero. No matter which way you move, you aren't going "down" anymore.

This is called Convergence. You have found the optimal $\theta$!