What is the Residual Standard Deviation
The residual standard deviation is a statistical term used to describe the standard deviation of points formed around a linear function and is an estimate of the accuracy of the dependent variable being measured.
Residual standard deviation is also referred to as the standard deviation of points around a fitted line and is sometimes called the standard error of estimate.
BREAKING DOWN Residual Standard Deviation
The residual standard deviation is a goodness-of-fit measure that can be used to measure how well the data points align with the actual model. To calculate the residual standard deviation, the difference between the predicted values and the actual values formed around a fitted line must be calculated first. This difference is known as the residual value or, simply, residuals, which is the distance between the actual points and what the model will predict.
The residual is calculated as:
r_{i} = y_{i} - ŷ
where r_{i} = residual value
y_{i} = observed value for a given x value
ŷ = predicted value for a given x value
For example, assuming we have a set of 4 observed values for an unnamed experiment, the table below shows y values observed and recorded for given values of x:
x |
y |
1 |
1 |
2 |
4 |
3 |
6 |
4 |
7 |
If the linear equation predicted by the data in the model is given as ŷ = 1x + 2, the residual for each observation can be found. For the first set, the actual y value is 1, but the predicted y value given by the equation is ŷ = 1(1) + 2 = 3. The residual value is, therefore, 1 – 3 = -2, a negative residual value.
The predicted y value when x is 2 and y is observed to be 4 can be calculated as 1(2) + 2 = 4. In this case, the actual and predicted values are the same, and the residual value will be zero. The same techniques for arriving at the predicted value for y can be done for the remaining data sets given x.
Once we’ve calculated the residuals for all points using the table or a graph, the residual standard deviation can be calculated as:
Expanding the table above, let’s calculate the residual standard deviation:
x |
y |
ŷ |
r_{i} |
(r_{i})^{2} |
1 |
1 |
3 |
-2 |
4 |
2 |
4 |
4 |
0 |
0 |
3 |
6 |
5 |
1 |
1 |
4 |
7 |
6 |
1 |
1 |
Sum of squared residuals: 6
Number of residuals less 1: 4-1 = 3
Residual standard deviation: √(6/3) = √2 ≈ 1.4142
The magnitude of a typical residual can give us a sense of generally how close our estimates are. The smaller the residual standard deviation, the closer is the fit to the data. In effect, the smaller the residual standard deviation is compared to the sample standard deviation, the more predictive, or adequate, the model is.
The residual standard deviation can be calculated when a regression analysis has been performed, as well as an analysis of variance (ANOVA). When determining a limit of quantitation (LoQ), the use of a residual standard deviation is permissible instead of the standard deviation.