Linear Regression Deep Dive
Build predictive models using least squares regression. Learn to create, interpret, and validate regression models.
Ready to calculate?
Try our Linear Regression Calculator
Introduction
Linear regression is a statistical method for modeling the relationship between a dependent variable (Y) and one or more independent variables (X). It's used for prediction and understanding how variables are related.
Simple linear regression uses one predictor variable, while multiple regression uses several. This guide focuses on simple linear regression, the foundation for all regression analysis.
The Regression Equation
Linear Regression Formula
ŷ (y-hat)
Predicted value of Y
a
Y-intercept (value when x=0)
b
Slope (change in Y per unit X)
Least Squares Method
The least squares method finds the line that minimizes the sum of squared differences between observed and predicted values.
Slope and Intercept Formulas
Slope (b):
b = Σ[(xᵢ - x̄)(yᵢ - ȳ)] / Σ(xᵢ - x̄)²
Intercept (a):
a = ȳ - b × x̄
Understanding Slope & Intercept
Slope (b)
The slope tells you how much Y changes for each one-unit increase in X.
Example: If b = 2.5 for predicting salary from years of experience:
For each additional year of experience, salary increases by $2,500 on average.
Intercept (a)
The intercept is the predicted value of Y when X equals zero.
Example: If a = $35,000:
The predicted starting salary (0 years experience) is $35,000.
Making Predictions
Example Prediction
Using our salary model: ŷ = 35000 + 2500x
Question: Predict salary for 5 years of experience
ŷ = 35000 + 2500(5)
ŷ = 35000 + 12500
ŷ = $47,500
Extrapolation Warning
Avoid predicting Y for X values outside your data range. The linear relationship may not hold beyond the observed range.
Regression Assumptions
1. Linearity
The relationship between X and Y is linear.
2. Independence
Observations are independent of each other.
3. Homoscedasticity
Variance of residuals is constant across all X values.
4. Normality
Residuals are normally distributed.
Residual Analysis
Residuals are the differences between observed and predicted values: e = y - ŷ
What Good Residuals Look Like
- • Randomly scattered around zero
- • No patterns or trends
- • Roughly equal spread at all X values
- • Approximately normally distributed
Summary
Key Takeaways
- 1.Linear regression models Y as a linear function of X: ŷ = a + bx.
- 2.Least squares minimizes the sum of squared residuals.
- 3.Slope (b) represents the change in Y per unit change in X.
- 4.Check assumptions before trusting your model.
- 5.Avoid extrapolating beyond your data range.