Degrees of Freedom in Statistics Explained: Formula and Example

Degrees of Freedom

Investopedia / Joules Garcia

What Are Degrees of Freedom?

Degrees of freedom refers to the maximum number of logically independent values, which are values that have the freedom to vary, in the data sample. Once the degrees of freedom quantity have been selected, specific data sample items must be chosen if there is a outstanding requirement of the data sample.

Key Takeaways

  • Degrees of freedom refers to the maximum number of logically independent values, which are values that have the freedom to vary, in the data sample.
  • Degrees of freedom is calculated by subtracting one from the number of items within the data sample.
  • Degrees of freedom are commonly discussed in relation to various forms of hypothesis testing in statistics, such as a chi-square.
  • Calculating degrees of freedom is key when trying to understand the importance of a chi-square statistic and the validity of the null hypothesis.
  • Degrees of freedom can also describe business situations where management must made a decision that dictates the outcome of another variable.

Understanding Degrees of Freedom

Degrees of freedom are the number of independent variables that can be estimated in a statistical analysis. These value of these variables are without constraint, although the values do impost restrictions on other variables if the data set is to comply with estimate parameters.

Within a data set, some initial numbers can be chosen at random. However, if the data set must add up to a specific sum or mean, for example, the number in the data set is constrained to evaluate the values of all other values in a data set, then meet the set requirement.

Examples of Degrees of Freedom

The easiest way to understand degrees of freedom conceptually is through several examples.

Example 1: Consider a data sample consisting of five positive integers. The values of the five integers must have an average of six. If four of the items within the data set are {3, 8, 5, and 4}, the fifth number must be 10. Because the first four numbers can be chosen at random, the degrees of freedom is four.

Example 2: Consider a data sample consisting of five positive integers. The values could be any number with no known relationship between them. Because all five numbers can be chosen at random with no limitations, the degrees of freedom is four.

Example 3: Consider a data sample consisting of one integer. That integer must be odd. Because there are constraints on the single item within the data set, the degrees of freedom is zero.

Degrees of Freedom Formula

The formula to determine degrees of freedom is:

D f = N 1 where: D f = degrees of freedom N = sample size \begin{aligned} &\text{D}_\text{f} = N - 1 \\ &\textbf{where:} \\ &\text{D}_\text{f} = \text{degrees of freedom} \\ &N = \text{sample size} \\ \end{aligned} Df=N1where:Df=degrees of freedomN=sample size

For example, imagine a task of selecting 10 baseball players whose bating average must average to .250. The total number of players that will make up our data set is the sample size, so N = 10. In this example, 9 (10 - 1) baseball players can theoretically be picked at random, with the 10th baseball player having to have a specific batting average to adhere to the .250 batting average constraint.

Some calculations of degrees of freedom with multiple number of parameters or relationships use the formula Df = N - P, where P is the number of different parameters or relationships. For example, in a 2-sample t-test, N - 2 is used because there are two parameters to estimate.

History of Degrees of Freedom

The earliest and most basic concept of degrees of freedom was noted in the early 1800s, intertwined in the works of mathematician and astronomer Carl Friedrich Gauss. The modern usage and understanding of the term were expounded upon first by William Sealy Gosset, an English statistician, in his article "The Probable Error of a Mean," published in Biometrika in 1908 under a pen name to preserve his anonymity.

In his writings, Gosset did not specifically use the term "degrees of freedom." He did, however, give an explanation for the concept throughout the course of developing what would eventually be known as Student’s T-distribution. The actual term was not made popular until 1922. English biologist and statistician Ronald Fisher began using the term "degrees of freedom" when he started publishing reports and data on his work developing chi-squares.

Chi-Square Tests

Degrees of freedom are commonly discussed in relation to various forms of hypothesis testing in statistics, such as a chi-square. It is essential to calculate degrees of freedom when trying to understand the importance of a chi-square statistic and the validity of the null hypothesis.

There are two different kinds of chi-square tests: the test of independence, which asks a question of relationship, such as, "Is there a relationship between gender and SAT scores?"; and the goodness-of-fit test, which asks something like "If a coin is tossed 100 times, will it come up heads 50 times and tails 50 times?"

For these tests, degrees of freedom are utilized to determine if a certain null hypothesis can be rejected based on the total number of variables and samples within the experiment. For example, when considering students and course choice, a sample size of 30 or 40 students is likely not large enough to generate significant data. Getting the same or similar results from a study using a sample size of 400 or 500 students is more valid.

T-Test

To perform a t-test, you must calculate for the value of t for the sample and compare it to a critical value. The critical value will vary, and you can determine the correct critical value by using a data set's t distribution with the correct degrees of freedom.

Sets with lower degrees of freedom have a higher probability of extreme values, while higher degrees of freedom (i.e. a sample size of at least 30) will be much closer to a normal distribution curve. This is because smaller sample sizes will correspond with smaller degrees of freedom which will result in fatter t-distribution tails.

In the examples above, many of the situations may be used as a 1-sample t-test. For instance, 'Example 1' where five values are selected but must add up to a specific average can be defined as a 1-sample t-test. This is because there is only one constraint being placed on the variable.

Application of Degrees of Freedom

In statistics, degrees of free defines the shape of the t-distribution used in t-tests when calculating the p-value. Depending on the sample size, different degrees of freedom will display different t-distributions. Calculating degrees of freedom is also critical when trying to understand the importance of a chi-square statistic and the validity of the null hypothesis.

Degrees of freedom also has conceptual applications outside of statistics. As a business is faced with making decisions, one choice may affix the result of another variable. Consider a company deciding on how much raw materials to purchase as part of its manufacturing process. The company has two items within this data set: the amount of raw materials to acquire and the total cost of the raw materials.

The company freely decide one of the two items, but their choice will dictate the outcome of the other. By setting the amount of raw materials to acquire, the company does not have a say in the total amount spent. By setting the total amount to spend, the company may be limited in the amount of raw materials it can acquire. Because it can only freely choose one of the two, it has one degree of freedom in this situation.

How Do You Determine Degrees of Freedom?

When determining the mean of a set of data, degrees of freedom is calculated as the number of items within a set minus one. This is because all items within that set can be randomly selected until there is one item remaining; that one item must conform to a given average.

What Does Degrees of Freedom Tell You?

Degrees of freedom tells you how many units within a set can be selected without constraints to still abide by a given rule overseeing the set. For example, consider a set of five items that add to an average value of 20. Degrees of freedom tell you how many of the items (4) can be randomly selected before constraints must be put in place. In this example, once the first four items are picked, you no longer have liberty to randomly select a data point because you must "force balance" to the given average.

Is the Degree of Freedom Always 1?

Degrees of freedom is always the number of units within a given set minus 1. It is always minus one because, if there are parameters placed on the data set, the last data item must be very specific to make sure all other points conform to that outcome.

The Bottom Line

Some statistical analysis processes may call for an indication on the number of independent values that can vary within an analysis to still meet constraint requirements. This indication is the degrees of freedom, the number of units in a sample size that can chosen at random before a specific value must be picked.

Article Sources
Investopedia requires writers to use primary sources to support their work. These include white papers, government data, original reporting, and interviews with industry experts. We also reference original research from other reputable publishers where appropriate. You can learn more about the standards we follow in producing accurate, unbiased content in our editorial policy.
  1. Biometrika. "The Probable Error of a Mean."