What is the Line Of Best Fit

Line of best fit refers to a line through a scatter plot of data points that best expresses the relationship between those points. Statisticians typically use the least squares method to arrive at the geometric equation for the line, either though manual calculations or regression analysis software. A straight line will result from a simple linear regression analysis of two or more independent variables. A regression involving multiple related variables can produce a curved line in some cases.

1:00

Line Of Best Fit

BREAKING DOWN Line Of Best Fit

Line of best fit is one of the most important outputs of regression analysis. Regression refers to a quantitative measure of the relationship between one or more independent variables and a resulting dependent variable. Regression is of use to professionals in a wide range of fields from science and public service to financial analysis.

To perform a regression analysis, a statistician collects a set of data points, each including a complete set of dependent and independent variables. For example, the dependent variable could be a firm’s stock price and the independent variables could be the Standard and Poor’s 500 index and the national unemployment rate, assuming that the stock is not listed in the S&P 500. The sample set could be each of these three data sets for the past 20 years. On a chart, these data points would appear as scatter plot, a set of points that may or may not appear to be organized along any line. If a linear pattern is apparent, it may be possible to sketch a line of best fit that minimizes the distance of those points from that line. If no organizing axis is visually apparent, regression analysis can generate a line based on the least squares method. This method builds the line which minimizes the squared distance of each point from the line of best fit.

To determine the formula for this line, the statistician enters these three results for the past 20 years into a regression software application. The software produces a linear formula that expresses the causal relationship between the S&P 500, the unemployment rate, and the stock price of the company in question. This equation is the formula for the line of best fit. It is a predictive tool, providing analysts and traders with a mechanism to project the firm’s future stock price based on those two independent variables.

The Line of Best Fit Equation and Its Components

A regression with two independent variables such as the example discussed above will produce a formula with this basic structure:

y= c + b1(x1) + b2(x2)

In this equation, y is the dependent variable, c is a constant, b1 is the first regression coefficient and x1 is the first independent variable. The second coefficient and second independent variable are b2 and x2. Drawing from the above example, the stock price would be y, the S&P 500 would be x1 and the unemployment rate would be x2. The coefficient of each independent variable represents the degree of change in y for each additional unit in that variable. If the S&P 500 increases by one, the resulting y or share price will go up by the amount of the coefficient. The same is true for the second independent variable, the unemployment rate. In a simple regression with one independent variable, that coefficient is the slope of the line of best fit. In this example or any regression with two independent variables the slope is a mix of the two coefficients. The constant c is the y-intercept of the line of best fit.