What Is a Simple Random Sample?
A simple random sample is a subset of a statistical population in which each member of the subset has an equal probability of being chosen. A simple random sample is meant to be an unbiased representation of a group.
- A simple random sample takes a small, random portion of the entire population to represent the entire data set, where each member has an equal probability of being chosen.
- Researchers can create a simple random sample using methods like lotteries or random draws.
- A sampling error can occur with a simple random sample if the sample does not end up accurately reflecting the population it is supposed to represent.
- Simple random samples are determined by assigning sequential values to each item within a population, then randomly selecting those values.
- Simple random sampling provides a different sampling approach compared to systematic sampling, stratified sampling, or cluster sampling.
Simple Random Sample
Understanding a Simple Random Sample
Researchers can create a simple random sample using a couple of methods. With a lottery method, each member of the population is assigned a number, after which numbers are selected at random.
An example of a simple random sample would be the names of 25 employees being chosen out of a hat from a company of 250 employees. In this case, the population is all 250 employees, and the sample is random because each employee has an equal chance of being chosen. Random sampling is used in science to conduct randomized control tests or for blinded experiments.
The example in which the names of 25 employees out of 250 are chosen out of a hat is an example of the lottery method at work. Each of the 250 employees would be assigned a number between 1 and 250, after which 25 of those numbers would be chosen at random.
Because individuals who make up the subset of the larger group are chosen at random, each individual in the large population set has the same probability of being selected. This creates, in most cases, a balanced subset that carries the greatest potential for representing the larger group as a whole.
For larger populations, a manual lottery method can be quite onerous. Selecting a random sample from a large population usually requires a computer-generated process, by which the same methodology as the lottery method is used, only the number assignments and subsequent selections are performed by computers, not humans.
Room for Error
With a simple random sample, there has to be room for error represented by a plus and minus variance (sampling error). For example, if in a high school of 1,000 students a survey were to be taken to determine how many students are left-handed, random sampling can determine that eight out of the 100 sampled are left-handed. The conclusion would be that 8% of the student population of the high school are left-handed, when in fact the global average would be closer to 10%.
The same is true regardless of the subject matter. A survey on the percentage of the student population that has green eyes or is physical disability would result in a mathematical probability based on a simple random survey, but always with a plus or minus variance. The only way to have a 100% accuracy rate would be to survey all 1,000 students which, while possible, would be impractical.
Although simple random sampling is intended to be an unbiased approach to surveying, sample selection bias can occur. When a sample set of the larger population is not inclusive enough, representation of the full population is skewed and requires additional sampling techniques.
How to Conduct a Simple Random Sample
The simple random sampling process entails size steps. Each step much be performed in sequential order.
Step 1: Define the Population
The origin of statistical analysis is to determine the population base. This is the group in which you wish to learn more about, confirm a hypothesis, or determine a statistical outcome. This step is to simply identify what that population base is and to ensure that group will adequately cover the outcome you are trying to solve for.
Example: I wish to learn how the stocks of the largest companies in the United States have performed over the past 20 years. My population is the largest companies in the United States as determined by the S&P 500.
Step 2: Choose Sample Size
Before picking the units within a population, we need to determine how many units to select This sample size may be constrained based on the amount of time, capital rationing, or other resources available to analyze the sample. However, be mindful to pick a sample size large enough to be truly representative of the population. In the example above, there are constrains in analyzing the performance for every stock in the S&P 500, so we only want to analyze a sub-set of this population.
Example: My sample size will be 20 companies from the S&P 500.
Step 3: Determine Population Units
In our example, the items within the population are easy to determine as they've already been identified for us (i.e. the companies listed within the S&P 500). However, imagine analyzing the students currently enrolled at a university or food products being sold at a grocery store. This steps entails crafting the entire list of all items within your population.
Example: Using exchange information, I copy the companies comprising the S&P 500 into an Excel spreadsheet.
Step 4: Assign Numerical Values
The simple random sample process call for every unit within the population receiving an unrelated numerical value. This is often assigned based on how the data may be filtered. For example, I could assign the numbers 1 to 500 to the companies based on market cap, alphabetical, or company formation date. How the values are assigned doesn't entirely matter; all that matters is each value is sequential and each value has an equal chance of being selected.
Example: I assign the numbers 1 through 500 to the companies in the S&P 500 based on alphabetical order of the current CEO, with the first company receiving the value '1' and the last company receiving the value '500'.
Step 5: Select Random Values
In step 2, we selected the number of items we wanted to analyze within our population. For the running example, we choose to analyze 20 items. In the fifth step, we randomly select 20 numbers of the values assigned to our variables. In the running example, this is the numbers 1 through 500. There are multiple ways to randomly select these 20 numbers discussed later in this article.
Example: Using the random number table, I select the numbers 2, 7, 17, 67, 68, 75, 77, 87, 92, 101, 145, 201, 222, 232, 311, 333, 376, 401, 478, and 489.
Step 6: Identify Sample
The last step of a simple random sample is the bridge step 4 and step 5. Each of the random variables selected in the prior step corresponds to a item within our population. The sample is selected by identifying which random values were chosen and which population items those values match.
Example: My sample consists of the 2nd item in the list of companies alphabetically listed by CEO's last name. My sample also consists of company number 7, 17, 67, etc.
Random Sampling Techniques
There is no single method for determining the random values to be selected (i.e. Step 5 above). The analyst can not simply choose numbers at random as there may not be randomness with numbers. For example, the analyst's wedding anniversary may be the 24th, so they may consciously (or subconsciously) pick the random value 24. Instead, the analyst may choose one of the following methods:
- Random lottery. Whether by ping-pong ball or slips of paper, each population number receives an equivalent item that is stored in a box or other indistinguishable container. Then, random numbers are selected by pulling or selecting items without view from the container.
- Physical Methods. Simple, early methods of random selection may use dice, flipping coins, or spinning wheels. Each outcome is assigned a value or outcome relating to the population.
- Random number table. Many statistics and research books contain sample tables with randomized numbers.
- Online random number generator. Many online tools exist where the analyst inputs the population size and sample size to be selected.
- Random numbers from Excel. Numbers can be selected in Excel using the =RANDBETWEEN formula. A cell containing =RANDBETWEEN(1,5) will selected a single random number between 1 and 5.
When pulling together a sample, consider getting assistance from a colleague or independent person. They may be able to identify biases or discrepancies you may not be aware of.
Simple Random vs. Other Sampling Methods
Simple Random vs. Stratified Random Sample
A simple random sample is used to represent the entire data population. A stratified random sample divides the population into smaller groups, or strata, based on shared characteristics.
Unlike simple random samples, stratified random samples are used with populations that can be easily broken into different subgroups or subsets. These groups are based on certain criteria, then elements from each are randomly chosen in proportion to the group's size versus the population. In our example above, S&P 500 companies could have broken into headquarter geographical region or industry.
This method of sampling means there will be selections from each different group—the size of which is based on its proportion to the entire population. Researchers must ensure the strata do not overlap. Each point in the population must only belong to one stratum so each point is mutually exclusive. Overlapping strata would increase the likelihood that some data are included, thus skewing the sample.
Simple Random vs. Systematic Sampling
Systematic sampling entails selecting a single random variable, and that variable determines the internal in which the population items are selected. For example, if the number 37 was chosen, the 37th company on the list sorted by CEO last name would be selected by the sample. Then, the 74th (i.e. the next 37th) and the 111st (i.e. the next 37th after that) would be added as well.
Simple random sampling does not have a starting point; therefore, there is the risk that the population items selected at random may cluster. In our example, there may be an abundance of CEOs with the last name that start with the letter 'F'. Systematic sampling strives to even further reduce bias to ensure these clusters do not happen.
Simple Random vs. Cluster Sampling
Cluster sampling can occur as a one-stage cluster or two-stage cluster. In a one-stage cluster, items within a population are put into comparable groupings; using our example, companies are grouped by year formed. Then, sampling occurs within these clusters.
Two-stage cluster sampling occurs when clusters are formed through random selection. The population is not clustered with other similar items. Then, sample items are randomly selected within each cluster.
Simple random sampling does not cluster any population sets. Though sample random sampling may be a simpler, clustering (especially two-stage clustering) may enhance the randomness of sample items. In addition, cluster sampling may provide a deeper analysis on a specific snapshot of a population which may or may not enhance the analysis.
Advantages and Disadvantages of Simple Random Samples
While simple random samples are easy to use, they do come with key disadvantages that can render the data useless.
Advantages of Simple Random Sample
Ease of use represents the biggest advantage of simple random sampling. Unlike more complicated sampling methods, such as stratified random sampling and probability sampling, no need exists to divide the population into sub-populations or take any other additional steps before selecting members of the population at random.
A simple random sample is meant to be an unbiased representation of a group. It is considered a fair way to select a sample from a larger population since every member of the population has an equal chance of getting selected. Therefore, simple random sampling is known for its randomness and less chance of sampling bias.
Disadvantages of Simple Random Sample
A sampling error can occur with a simple random sample if the sample does not end up accurately reflecting the population it is supposed to represent. For example, in our simple random sample of 25 employees, it would be possible to draw 25 men even if the population consisted of 125 women, 125 men, and 125 nonbinary people.
For this reason, simple random sampling is more commonly used when the researcher knows little about the population. If the researcher knew more, it would be better to use a different sampling technique, such as stratified random sampling, which helps to account for the differences within the population, such as age, race, or gender.
Other disadvantages include the fact that for sampling from large populations, the process can be time-consuming and costly compared to other methods. Researchers may find a certain project not worth the endeavor of its cost-benefit analysis does not generate positive results. As every unit has to be assigned an identifying or sequential number prior to the selection process, this task may be difficult based on the method of data collection or size of the data set.
Simple Random Sampling
Each item within a population has an equal chance of being selected
There is less of a chance of sampling bias as every item is randomly selected
This sampling method is easy and convenient for data sets already listed or digitally stored
Incomplete population demographics may exclude certain groups from being sampled
Random selection means the sample may not be truly representative of the population
Depending on the data set size and format, random sampling may be a time-intensive process
Why Is a Simple Random Sample Simple?
No easier method exists to extract a research sample from a larger population than simple random sampling. Selecting enough subjects completely at random from the larger population also yields a sample that can be representative of the group being studied.
What Are Some Drawbacks of a Simple Random Sample?
Among the disadvantages of this technique are difficulty gaining access to respondents that can be drawn from the larger population, greater time, greater costs, and the fact that bias can still occur under certain circumstances.
What Is a Stratified Random Sample?
A stratified random sample, in contrast to a simple draw, first divides the population into smaller groups, or strata, based on shared characteristics. Therefore, a stratified sampling strategy will ensure that members from each subgroup are included in the data analysis. Stratified sampling is used to highlight differences between groups in a population, as opposed to simple random sampling, which treats all members of a population as equal, with an equal likelihood of being sampled.
How Are Random Samples Used?
Using simple random sampling allows researchers to make generalizations about a specific population and leave out any bias. Using statistical techniques, inferences and predictions can be made about the population without having to survey or collect data from every individual in that population.
The Bottom Line
When analyzing a population, simple random sampling is a technique that results in every item within the population to have the same probability of being selected for the sample size. This more basic form of sampling can be expanded upon to derive more complicated sampling methods. However, the process of making a list of all items in a population, assigning each a sequential number, choosing the sample size, and randomly selecting items is a more basic form of selecting units for analysis.