A company wants to predict the salary of a new employee based on their years of experience. We have data from 5 current employees. We will use Simple Linear Regression to find the best-fitting line: $y = mx + b$.
| Employee | Years of Experience ($x$) | Salary in $1000s ($y$) |
|---|---|---|
| A | 1 | 30 |
| B | 2 | 35 |
| C | 3 | 50 |
| D | 4 | 60 |
| E | 5 | 75 |
We need to find the slope ($m$) and y-intercept ($b$) that minimizes the error.
The formula for slope $m$ is: $$ m = \frac{n(\sum xy) - (\sum x)(\sum y)}{n(\sum x^2) - (\sum x)^2} $$
Plugging these into the formula for $m$: $$ m = \frac{5(865) - (15)(250)}{5(55) - (15)^2} $$ $$ m = \frac{4325 - 3750}{275 - 225} = \frac{575}{50} = 11.5 $$
Now finding the intercept $b$ using $b = \bar{y} - m\bar{x}$: $$ \bar{y} = 250 / 5 = 50 $$ $$ \bar{x} = 15 / 5 = 3 $$ $$ b = 50 - 11.5(3) = 50 - 34.5 = 15.5 $$
Salary = 11.5(Experience) + 15.5
This means for every extra year of experience, the salary increases by $11,500, starting from a base of $15,500.
If we hire someone with 6 years of experience: $$ y = 11.5(6) + 15.5 $$ $$ y = 69 + 15.5 = 84.5 $$
We would predict a salary of $84,500.