## What Is Stepwise Regression?

Regression analysis is a widely used statistical approach that seeks to identify relationships between variables. The idea is to pool relevant data to make better-informed decisions and is a common practice in the world of investing. Stepwise regression is the step-by-step iterative construction of a regression model that involves automatic selection of independent variables. The availability of statistical software packages makes stepwise regression possible, even in models with hundreds of variables.

## Types of Stepwise Regression

The underlying goal of stepwise regression is, through a series of tests (F-tests, t-tests) to find a set of independent variables which significantly influence the dependent variable. This is done with computers through iteration, which is the process of arriving to results or decisions by going through repeated rounds or cycles of analysis. Conducting tests automatically with the help from statistical software packages has the advantage of saving time for the individual.

### Key Takeaways

- Regression analysis is a statistical approach that seeks to understand and measure relationships between independent and dependent variables.
- Stepwise regression is a method that examines the statistical significance of each independent variable within the model.
- The forward selection approach adds a variable and then tests for statistical significance.
- The backward elimination method begins with a model loaded with many variables and then removes one variable to test its importance relative to overall results.
- Stepwise regression has many critics, as it is approach that fits data into a model to achieve a desired result.

Stepwise regression can be achieved either by trying out one independent variable at a time and including it in the regression model if it is statistically significant or by including all potential independent variables in the model and eliminating those that are not statistically significant. Some use a combination of both methods and therefore there are three approaches to stepwise regression:

- Forward selection begins with no variables in the model, tests each variable as it is added to the model, then keeps those that are deemed most statistically significant—repeating the process until the results are optimal.
- Backward elimination starts with a set of independent variables, deleting one at a time, then testing to see if removed variable is statistically significant.
- Bidirectional elimination is a combination of the first two methods that tests which variables should be included or excluded.

An example of a stepwise regression using the backward elimination method would be an attempt to understand energy usage at a factory using variables such as equipment run time, equipment age, staff size, temperatures outside, and time of year. The model includes all of the variables—then each is removed, one at a time, to determine which is least statistically significant. In the end, the model might show that time of year and temperatures are most significant, possibly suggesting the peak energy consumption at the factory is when air conditioner usage is at its highest.

## Limitations of Stepwise Regression

Regression analysis, both linear and multivariate, are widely-used in the investment world today. The idea is often to find patterns that existed in the past that might also recur in the future. A simple linear regression, for example, might look at the price-to-earnings ratios and stock returns over many years to determine if stocks with low P/E ratios (independent variable) offer higher returns (dependent variable). The problem with this approach is that market conditions often change and relationships that held in the past do not necessarily hold true in the present or future.

Meanwhile, the stepwise regression process has many critics and there are even calls to stop using the method altogether. Statisticians note several drawbacks to the approach, including incorrect results, an inherent bias in the process itself, and the necessity for significant computing power to develop complex regression models through iteration.