Skip to main content
Back to Guides
Advanced22 min read

Linear Regression Deep Dive

Build predictive models using least squares regression. Learn to create, interpret, and validate regression models.

Ready to calculate?

Try our Linear Regression Calculator

Open Calculator

Introduction

Linear regression is a statistical method for modeling the relationship between a dependent variable (Y) and one or more independent variables (X). It's used for prediction and understanding how variables are related.

Simple linear regression uses one predictor variable, while multiple regression uses several. This guide focuses on simple linear regression, the foundation for all regression analysis.

The Regression Equation

Linear Regression Formula

ŷ = a + bx

ŷ (y-hat)

Predicted value of Y

a

Y-intercept (value when x=0)

b

Slope (change in Y per unit X)

Least Squares Method

The least squares method finds the line that minimizes the sum of squared differences between observed and predicted values.

Slope and Intercept Formulas

Slope (b):

b = Σ[(xᵢ - x̄)(yᵢ - ȳ)] / Σ(xᵢ - x̄)²

Intercept (a):

a = ȳ - b × x̄

Understanding Slope & Intercept

Slope (b)

The slope tells you how much Y changes for each one-unit increase in X.

Example: If b = 2.5 for predicting salary from years of experience:

For each additional year of experience, salary increases by $2,500 on average.

Intercept (a)

The intercept is the predicted value of Y when X equals zero.

Example: If a = $35,000:

The predicted starting salary (0 years experience) is $35,000.

Making Predictions

Example Prediction

Using our salary model: ŷ = 35000 + 2500x

Question: Predict salary for 5 years of experience

ŷ = 35000 + 2500(5)

ŷ = 35000 + 12500

ŷ = $47,500

Extrapolation Warning

Avoid predicting Y for X values outside your data range. The linear relationship may not hold beyond the observed range.

Regression Assumptions

1. Linearity

The relationship between X and Y is linear.

2. Independence

Observations are independent of each other.

3. Homoscedasticity

Variance of residuals is constant across all X values.

4. Normality

Residuals are normally distributed.

Residual Analysis

Residuals are the differences between observed and predicted values: e = y - ŷ

What Good Residuals Look Like

  • • Randomly scattered around zero
  • • No patterns or trends
  • • Roughly equal spread at all X values
  • • Approximately normally distributed

Summary

Key Takeaways

  • 1.Linear regression models Y as a linear function of X: ŷ = a + bx.
  • 2.Least squares minimizes the sum of squared residuals.
  • 3.Slope (b) represents the change in Y per unit change in X.
  • 4.Check assumptions before trusting your model.
  • 5.Avoid extrapolating beyond your data range.