What Is a Quartile?
A quartile is a statistical term that describes a division of observations into four defined intervals based on the values of the data and how they compare to the entire set of observations.
- The quartiles data into three points—a lower quartile, median, and upper quartile—to form four groups of the dataset.
- Along with the minimum and maximum values of the data set, the quartiles divide a set of observations into four sections, each representing 25% of the observations.
- Quartiles are used to calculate the interquartile range, which is a measure of variability around the median.
To understand the quartile, it is important to understand the median as a measure of central tendency. The median in statistics is the middle value of a set of numbers. It is the point at which exactly half of the data lies below and above the central value.
So, given a set of 13 numbers that are sorted (ascending or descending), the median would be the seventh number. The six numbers preceding this value are the lowest numbers in the data, and the six numbers after the median are the highest numbers in the dataset given. Because the median is not affected by extreme values or outliers in the distribution, it is sometimes preferred to the mean.
The median is a robust estimator of location but says nothing about how the data on either side of its value is spread or dispersed. That's where the quartile steps in. The quartile measures the spread of values above and below the mean by dividing the distribution into four groups.
How Quartiles Work
Just like the median divides the data into half so that 50% of the measurement lies below the median and 50% lies above it, the quartile breaks down the data into quarters so that 25% of the measurements are less than the lower quartile, 50% are less than the median, and 75% are less than the upper quartile.
There are three quartile values—a lower quartile, median, and upper quartile—to divide the data set into four ranges, each containing 25% of the data points. The lower quartile, or first quartile, is denoted as Q1 and is the middle number that falls between the smallest value of the dataset and the median. The second quartile, Q2, is also the median. The upper or third quartile, denoted as Q3, is the central point that lies between the median and the highest number of the distribution.
Now, we can map out the four groups formed from the quartiles. The first group of values contains the smallest number up to Q1; the second group includes Q1 to the median; the third set is the median to Q3; the fourth category comprises Q3 to the highest data point of the entire set.
Each interval contains 25% of the total observations. Generally, the data is arranged from smallest to largest:
- First interval: The set of data points between the minimum value and the first quartile.
- Second interval: The set of data points between the lower quartile and the median.
- Third interval: The set of data between the median and the upper quartile.
- Fourth interval: The set of data points between the upper quartile and the maximum value of the data set.
Example of Quartile
Suppose the distribution of math scores in a class of 19 students in ascending order is:
- 59, 60, 65, 65, 68, 69, 70, 72, 75, 75, 76, 77, 81, 82, 84, 87, 90, 95, 98
First, mark down the median, Q2, which in this case is the 10th value: 75.
Q1 is the central point between the smallest score and the median. In this case, Q1 falls between the first and fifth score: 68. (Note that the median can also be included when calculating Q1 or Q3 for an odd set of values. If we were to include the median on either side of the middle point, then Q1 will be the middle value between the first and 10th score, which is the average of the fifth and sixth score—(fifth + sixth)/2 = (68 + 69)/2 = 68.5).
Q3 is the middle value between Q2 and the highest score: 84. (Or if you include the median, Q3 = (82 + 84)/2 = 83).
Now that we have our quartiles, let’s interpret their numbers. A score of 68 (Q1) represents the first quartile and is the 25th percentile. 68 is the median of the lower half of the score set in the available data—that is, the median of the scores from 59 to 75.
Q1 tells us that 25% of the scores are less than 68 and 75% of the class scores are greater. Q2 (the median) is the 50th percentile and shows that 50% of the scores are less than 75, and 50% of the scores are above 75. Finally, Q3, the 75th percentile, reveals that 25% of the scores are greater and 75% are less than 84.
If the datapoint for Q1 is farther away from the median than Q3 is from the median, then we can say that there is a greater dispersion among the smaller values of the dataset than among the larger values. The same logic applies if Q3 is farther away from Q2 than Q1 is from the median.
Alternatively, if there is an even number of data points, the median will be the average of the middle two numbers. In our example above, if we had 20 students instead of 19, the median of their scores will be the arithmetic average of the 10th and 11th numbers.
Quartiles are used to calculate the interquartile range, which is a measure of variability around the median. The interquartile range is simply calculated as the difference between the first and third quartile: Q3–Q1. In effect, it is the range of the middle half of the data that shows how spread out the data is.
For large datasets, Microsoft Excel has a QUARTILE function to calculate quartiles.
How Do You Find the Lower Quartile of a Data Set?
The lower quartile of a data set is a point where about 25% of observations are below that point, and 75% of data points are above that point. In other words, it is the middle value between the lowest data point and the median of the data set.
How Do You Find the Upper Quartile of a Data Set?
The upper quartile is the point where about 75% of observations are below that point and 25% of observations are higher than that point. In other words, it is the middle value between the median of the data set and the maximum value.
What Is the Interquartile Range of a Data Set?
The interquartile range is the middle 50% of measurements in a data set—in other words, the range of data between the upper quartile and the lower quartile. This is more statistically meaningful than using the full range of data, because it omits possible outliers.