What Is a Sample?

A sample refers to a smaller, manageable version of a larger group. It is a subset containing the characteristics of a larger population. Samples are used in statistical testing when population sizes are too large for the test to include all possible members or observations. A sample should represent the population as a whole and not reflect any bias toward a specific attribute.

Key Takeaways

  • A sample refers to a smaller, manageable version of a larger group or subset of a larger population.
  • Using samples allows researchers to conduct their studies easily and in a timely fashion.
  • In order to achieve an unbiased sample, the selection has to be random so everyone from the population has an equal and likely chance of being added to the sample group.
  • In simple random sampling, every entity in the population is identical, while stratified random sampling divides the overall population into smaller groups. 

Understanding Samples

A sample is an unbiased number of observations taken from a population. In basic terms, a population is the total number of individuals, animals, items, observation, data, etc. of any given subject. So the sample, in other words, is a portion, part, or fraction of the whole group, and acts as a subset of the population. Samples are used in a variety of settings where research is conducted. Scientists, marketers, government agencies, economists, and research groups are among those who use samples for their studies and measurements.

Using whole populations for research comes with challenges, which is why samples are used. Researchers may have problems gaining ready access to entire populations. And because of the nature of some studies, researchers may have difficulties getting the results they need in a timely fashion. This is why people who conduct studies use samples. Using a smaller number of people who represent the entire population can still produce valid results while cutting back on time and resources.

Samples used by researchers should closely resemble the population. All the participants in the sample should share the same characteristics and qualities. So, if the study is about male college freshmen, the sample should be a small percentage of males that fit this description. Similarly, if a research group conducts a study on the sleep patterns of single women over 50, the sample should only include women within this demographic.

Consider a team of academic researchers who want to know how many students studied for less than 40 hours for the CFA exam and still passed. Since more than 200,000 people take the exam globally each year, reaching out to each and every exam participant may be extremely tedious and time-consuming. In fact, by the time the data from the population has been collected and analyzed, a couple of years would have passed, making the analysis worthless since a new population would have emerged. What the researchers can do instead is take a sample of the population and get data from this sample.

To get an unbiased sample, the selection must be random so everyone in the population has an equal chance of being added to the group.

In order to achieve an unbiased sample, the selection has to be random so everyone from the population has an equal and likely chance of being added to the sample group. This is similar to a lottery draw and is the basis for simple random sampling.

Types of Sampling

Simple Random Sampling

Simple random sampling is ideal if every entity in the population is identical. If the researchers don’t care whether their sample subjects are all male or all female or a combination of both sexes in some form, the simple random sampling may be a good selection technique.

Let's say there were 200,000 test-takers who sat for the CFA exam in 2016, out of which 40% were women and 60% were men. The random sample drawn from the population should, therefore, have 400 women and 600 men for a total of 1,000 test-takers.

But what about cases where knowing the ratio of men to women that passed a test after studying for less than 40 hours is important? Here, a stratified random sample would be preferable to a simple random sample.

Stratified Random Sampling

This type of sampling, also referred to as proportional random sampling or quota random sampling, divides the overall population into smaller groups. These are known as strata. People within the strata share similar characteristics.

What if age was an important factor that researchers would like to include in their data? Using the stratified random sampling technique, they could create layers or strata for each age group. The selection from each strata would have to be random so that everyone in the bracket has a likely chance of being included in the sample. For example, two participants, Alex and David, are 22 and 24 years old, respectively. The sample selection cannot pick one over the other based on some preferential mechanism. They both should have an equal chance of being selected from their age group. The strata could look something like this:

An example table of a strata of samples taken from a larger population

From the table, the population has been divided into age groups. For example, 30,000 people within the age range of 20 to 24 years old took the CFA exam in 2016. Using this same proportion, the sample group will have (30,000 ÷ 200,000) x 1,000 = 150 test-takers that fall within this group. Alex or David—or both or neither—may be included among the 150 random exam participants of the sample.

There are many more strata that could be compiled when deciding on a sample size. Some researchers might populate the job functions, countries, marital status, etc. of the test=takers when deciding how to create the sample.

Examples of Samples

As of 2017, the population of the world was 7.5 billion, out of which 49.6% were female and 50.4% were male. The total number of people in any given country can also be a population size. The total number of students in a city can be taken as a population, and the total number of dogs in a city is also a population size. Samples can be taken from these populations for research purposes.

Following our CFA exam example, the researchers could take a sample of 1,000 CFA participants from the total 200,000 test-takers—the population—and run the required data on this number. The mean of this sample would be taken to estimate the average of CFA exam takers that passed even though they only studied for less than 40 hours.

The sample group taken should not be biased. This means that if the sample mean of the 1,000 CFA exam participants is 50, the population mean of the 200,000 test-takers should also be approximately 50.