What is Linear Regression?
Linear regression is a data plot that graphs the linear relationship between an independent and a dependent variable. It is typically used to visually show the strength of the relationship and the dispersion of results – all for the purpose of explaining the behavior of the dependent variable.
Say we wanted to test the strength of the relationship between the amount of ice cream eaten and obesity. We would take the independent variable, the amount of ice cream, and relate it to the dependent variable, obesity, to see if there was a relationship. Given a regression is a graphical display of this relationship, the lower the variability in the data, the stronger the relationship and the tighter the fit to the regression line.
- Linear regression models the relationship between a dependent and independent variable(s).
- Regression analysis can be achieved if the variables are independent, there is no heteroscedasticity, and the error terms of variables are not be correlated.
- Modeling linear regression in Excel is easier with the Data Analysis ToolPak.
There are a few critical assumptions about your data set that must be true to proceed with a regression analysis:
- The variables must be truly independent (using a Chi-square test).
- The data must not have different error variances (this is called heteroskedasticity (also spelled heteroscedasticity)).
- The error terms of each variable must be uncorrelated. If not, it means the variables are serially correlated.
If those three things sound complicated, they are. But the effect of one of those considerations not being true is a biased estimate. Essentially, you would misstate the relationship you are measuring.
Outputting a Regression in Excel
The first step in running regression analysis in Excel is to double-check that the free Excel plugin Data Analysis ToolPak is installed. This plugin makes calculating a range of statistics very easy. It is not required to chart a linear regression line, but it makes creating statistics tables simpler. To verify if installed, select "Data" from the toolbar. If "Data Analysis" is an option, the feature is installed and ready to use. If not installed, you can request this option by clicking on the Office button and selecting "Excel options".
Using the Data Analysis ToolPak, creating a regression output is just a few clicks.
The independent variable goes in the X range.
Given the S&P 500 returns, say we want to know if we can estimate the strength and relationship of Visa (V) stock returns. The Visa (V) stock returns data populates column 1 as the dependent variable. S&P 500 returns data populates column 2 as the independent variable.
- Select "Data" from the toolbar. The "Data" menu displays.
- Select "Data Analysis". The Data Analysis - Analysis Tools dialog box displays.
- From the menu, select "Regression" and click "OK".
- In the Regression dialog box, click the "Input Y Range" box and select the dependent variable data (Visa (V) stock returns).
- Click the "Input X Range" box and select the independent variable data (S&P 500 returns).
- Click "OK" to run the results.
[Note: If the table seems small, right-click the image and open in new tab for higher resolution.]
Interpret the Results
Using that data (the same from our R-squared article), we get the following table:
The R2 value, also known as the coefficient of determination, measures the proportion of variation in the dependent variable explained by the independent variable or how well the regression model fits the data. The R2 value ranges from 0 to 1, and a higher value indicates a better fit. The p-value, or probability value, also ranges from 0 to 1 and indicates if the test is significant. In contrast to the R2 value, a smaller p-value is favorable as it indicates a correlation between the dependent and independent variables.
Charting a Regression in Excel
We can chart a regression in Excel by highlighting the data and charting it as a scatter plot. To add a regression line, choose "Layout" from the "Chart Tools" menu. In the dialog box, select "Trendline" and then "Linear Trendline". To add the R2 value, select "More Trendline Options" from the "Trendline menu. Lastly, select "Display R-squared value on chart". The visual result sums up the strength of the relationship, albeit at the expense of not providing as much detail as the table above.