← Back to Master Index

Regression Models

Understanding:

1. Simple Linear Regression

Simple Linear Regression models the relationship between a single independent variable (Input, $x$) and a dependent variable (Output, $y$) using a straight line.

📄 View Example: Predicting Salary based on Experience (Opens in new tab)

Linear Engine (Degree 1)

Click to add points.

Equation: y = 0x + 0
MSE: 0.00

2. Polynomial Regression

What if the data isn't straight? Polynomial Regression fits a curve by adding powers of $x$ ($x^2, x^3...$).

📄 View Example: Trajectory of a Ball (Opens in new tab)

This is still "Linear Regression" technically, because it is linear in the parameters (coefficients), even though it produces a curved line.

Polynomial Engine (Non-Linear)

Try adding points in a "U" or "S" shape.

2
MSE: 0.00

The Danger of Overfitting

Try setting the Degree to 6 with only a few points. Notice how the curve goes wild trying to hit every single point? That is Overfitting. The model is memorizing the noise instead of learning the pattern.

3. Multiple Linear Regression

While Simple/Poly Regression uses one input ($x$), Multiple Linear Regression uses multiple independent variables ($x_1, x_2, ...$).

Equation Comparison

Simple (Line):
y = b + w₁x

Polynomial (Curve):
y = b + w₁x + w₂x² + w₃x³

Multiple (Plane):
y = b + w₁x₁ + w₂x₂ + w₃x₃

Key Difference

In Multiple Regression, we fit a Hyperplane. We cannot visualize this easily on a 2D screen, but the math (minimizing squared errors) remains exactly the same.

4. Handling Categorical Data (Dummy Variables)

Linear Regression performs math on numbers. But real data often contains text or categories like "City", "Color", or "Yes/No". To use these in our model, we must translate them into numbers using Dummy Variables.

The Technique: One-Hot Encoding

Instead of assigning random numbers (Red=1, Blue=2, Green=3) which would confuse the model into thinking Green > Blue, we create separate "Switch" columns for each category.

❌ Bad Approach (Label Encoding)

Color Value
Red1
Blue2
Green3

Error: Model assumes Green is "3x more" than Red.

✅ Good Approach (Dummy Vars)

Is_Red Is_Blue Is_Green
100
010
001

Result: Each category is treated independently.

Note on the "Dummy Variable Trap":
Technically, if you have 3 categories, you only need 2 dummy variables (e.g., if it's not Red and not Blue, it must be Green). Including all 3 can cause Multicollinearity (Perfect Correlation).