Linear Regression as Geometry

Published: Jan 2025

Image source: Pixabay

Linear regression is usually introduced through formulas, normal equations, and matrix algebra. But at its core, linear regression is a geometric problem. Once you see it geometrically, concepts like least squares, residuals, multicollinearity, and overfitting become almost intuitive.

This post explains linear regression not as a statistical recipe, but as a problem of distance, angles, and projections in high-dimensional space.

The Space We Are Working In

Suppose we have n observations and p predictors. Each predictor is a vector of length n. Together, these predictors form the design matrix:

$$ X \in \mathbb{R}^{n \times p} $$

Each column of X represents a direction in an n-dimensional space. These directions span a flat geometric object known as the column space of X.

This column space is not curved, spherical, or nonlinear. It is a flat subspace (a plane, or hyperplane) embedded inside $ \mathbb{R}^n $.

Where Does the Response Live?

The response variable $ y $ is also a vector in $ \mathbb{R}^n $. But in general, y does not lie inside the column space of X.

This mismatch is the entire reason regression exists. If $ y $ already lay in the column space, we could explain it perfectly with a linear combination of predictors.

Instead, linear regression asks a very precise geometric question:

Among all points in the column space of X, which one is closest to y?

The Core Equation

$$ \hat{\beta} = (X^TX)^{-1}X^Ty $$

This equation does not "solve for coefficients" in an abstract sense. It computes the coordinates of a specific point inside the column space of X.

That point is:

$$ \hat{y} = X\hat{\beta} $$

Geometrically, $ \hat{y} $ is the orthogonal projection of $ y $ onto the column space of $ X $.

Least Squares = Shortest Distance

The residual vector is:

$$ r = y - \hat{y} $$

This residual is not arbitrary. It satisfies a strong geometric property:

$$ X^T r = 0 $$

Which means the residual is perpendicular to every column of X.

So when we say “least squares,” what we really mean is:

Draw a perpendicular from $ y $ to the column space of $ X $
The foot of that perpendicular is $ \hat{y} $
The squared length of that perpendicular is minimized

This is why ordinary least squares is a projection problem, not an optimization trick.

Understanding Coefficients Geometrically

Each regression coefficient answers a subtle geometric question:

How much do we move along this predictor's direction to reach the projection point?

The coefficients are not properties of $ y $ alone. They depend on:

The angles between predictor vectors
The scale of each predictor
The orientation of the column space

This is why multicollinearity makes coefficients unstable: when predictor directions are nearly parallel, the geometry becomes ill-conditioned.

High-Dimensional Intuition (Yes, 100 Dimensions)

Even in 100-dimensional space, the story is unchanged.

Each predictor is a direction
Their linear combinations form a flat surface
The response vector sits somewhere off that surface
Regression drops a perpendicular onto it

No matter how many dimensions we add, linear regression never bends the space. It only projects.

Key Takeaways

The response vector lives in $ \mathbb{R}^n $
Predictors span a flat subspace
OLS is an orthogonal projection
Residuals are perpendicular to predictors
Coefficients are geometric coordinates

If you understand this geometry, you are no longer memorizing regression. You are seeing it.

References & Further Reading

Montgomery, D. C., Peck, E. A., & Vining, G. G. Introduction to Linear Regression Analysis, 5th Edition. Wiley.
Strang, G. Linear Algebra and Its Applications. Cengage Learning.
Hastie, T., Tibshirani, R., & Friedman, J. The Elements of Statistical Learning. Springer.
Davidson, K., & Szarek, S. Local operator theory, random matrices, and Banach spaces.