When a model has too many features or the weights ($\theta$) become too large, it starts to "memorize" the training data instead of "learning" it. This is Overfitting.
Regularization fixes this by changing the Cost Function to punish large weights.
Ridge adds a penalty equal to the square of the magnitude of coefficients.
The Logic: If a weight ($\theta$) tries to become very large, the Cost Function "explodes." To keep the Cost low, Gradient Descent is forced to keep the weights small and evenly distributed.
Lasso ( Least Absolute Shrinkage and Selection Operator ) adds a penalty equal to the absolute value of the magnitude of coefficients.
The Logic: Because Lasso uses absolute values (the sharp "V" shape), it has a unique property: it can force weights to become exactly zero.
$\alpha$ is the Regularization Strength.
You have many features that all contribute a little bit to the result. It handles Multicollinearity (correlated features) very well.
You suspect only a few features are actually important. It will help you "clean" your data by ignoring the noise.