Quantitative Methods - Sampling Considerations

Sample Size
Increasing sample size benefits a research study by increasing the confidence and reliability of the confidence interval, and as a result, the precision with which the population parameter can be estimated. Other choices affect how wide or how narrow a confidence interval will be: choice of statistic, with t being wider/more conservative than z, as well as degree of confidence, with lesser degrees such as 90% resulting in wider/more conservative intervals than 99%. An increase in sample size tends to have an even more meaningful effect, due to the formula for standard error (i.e. the ratio of 'sample standard deviation / sample size1/2'), resulting in the fact that standard error varies inversely with sample size. As a result, more observations in the sample (all other factors equal) improve the quality of a research study.

At the same time, two other factors tend to make larger sample sizes less desirable. The first consideration, which primarily affects time-series data, is that population parameters have a tendency to change over time. For example, if we are studying a mutual fund and using five years of quarterly returns in our analysis (i.e. sample size of 20, 5 years x 4 quarters a year). The resulting confidence interval appears too wide so in an effort to increase precision, we use 20 years of data (80 observations). However, when we reach back into the 1980s to study this fund, it had a different fund manager, plus it was buying more small-cap value companies, whereas today it is a blend of growth and value, with mid to large market caps. In addition, the factors affecting today's stock market (and mutual fund returns) are much different compared to back in the 1980s. In short, the population parameters have changed over time, and data from 20 years ago shouldn't be mixed with data from the most recent five years.

The other consideration is that increasing sample size can involve additional expenses. Take the example of researching hiring plans at S&P 500 firms (cross-sectional research). A sample size of 25 was suggested, which would involve contacting the human resources department of 25 firms. By increasing the sample size to 100, or 200 or higher, we do achieve stronger precision in making our conclusions, but at what cost? In many cross-sectional studies, particularly in the real world, where each sample takes time and costs money, it's sufficient to leave sample size at a certain lower level, as the additional precision isn't worth the additional cost.

Data Mining Bias
Data mining is the practice of searching through historical data in an effort to find significant patterns, with which researchers can build a model and make conclusions on how this population will behave in the future. For example, the so-called January effect, where stock market returns tend to be stronger in the month of January, is a product of data mining: monthly returns on indexes going back 50 to 70 years were sorted and compared against one another, and the patterns for the month of January were noted. Another well-known conclusion from data mining is the 'Dogs of the Dow' strategy: each January, among the 30 companies in the Dow industrials, buy the 10 with the highest dividend yields. Such a strategy outperforms the market over the long run.

Bookshelves are filled with hundreds of such models that "guarantee" a winning investment strategy. Of course, to borrow a common industry phrase, "past performance does not guarantee future results". Data-mining bias refers to the errors that result from relying too heavily on data-mining practices. In other words, while some patterns discovered in data mining are potentially useful, many others might just be coincidental and are not likely to be repeated in the future - particularly in an "efficient" market. For example, we may not be able to continue to profit from the January effect going forward, given that this phenomenon is so widely recognized. As a result, stocks are bid for higher in November and December by market participants anticipating the January effect, so that by the start of January, the effect is priced into stocks and one can no longer take advantage of the model. Intergenerational data mining refers to the continued use of information already put forth in prior financial research as a guide for testing the same patterns and overstating the same conclusions.

Distinguishing between valid models and valid conclusions, and those ideas that are purely coincidental and the product of data mining, presents a significant challenge as data mining is often not easy to discover. A good start to investigate for its presence is to conduct an out-of-sample test - in other words, researching whether the model actually works for periods that do not overlap the time frame of the study. A valid model should continue to be statistically significant even when out-of-model tests are conducted. For research that is the product of data mining, a test outside of the model's time frame can often reveal its true nature. Other warning signs involve the number of patterns or variables examined in the research - that is, did this study simply search enough variables until something (anything) was finally discovered? Most academic research won't disclose the number of variables or patterns tested in the study, but oftentimes there are verbal hints that can reveal the presence of excessive data mining.

Above all, it helps when there is an economic rationale to explain why a pattern exists, as opposed to simply pointing out that a pattern is there. For example, years ago a research study discovered that the market tended to have positive returns in years that the NFC wins the Super Bowl, yet it would perform relatively poorly when the AFC representative triumphs. However, there's no economic rationale for explaining why this pattern exists - do people spend more, or companies build more, or investors invest more, based on the winner of a football game? Yet the story is out there every Super Bowl week. Patterns discovered as a result of data mining may make for interesting reading, but in the process of making decisions, care must be taken to ensure that mined patterns not be blindly overused.

Sample Selection Bias
Many additional biases can adversely affect the quality and the usefulness of financial research. Sample-selection bias refers to the tendency to exclude a certain part of a population simply because the data is not available. As a result, we cannot state that the sample we've drawn is completely random - it is random only within the subset on which historic data could be obtained.

Survivorship Bias
A common form of sample-selection bias in financial databases is survivorship bias, or the tendency for financial and accounting databases to exclude information on companies, mutual funds, etc. that are no longer in existence. As a result, certain conclusions can be made that may in fact be overstated were one to remove this bias and include all members of the population. For example, many studies have pointed out the tendency of companies with low price-to-book-value ratios to outperform those firms with higher P/BVs. However, these studies most likely aren't going to include those firms that have failed; thus data is not available and there is sample-selection bias. In the case of low and high P/BV, it stands to reason that companies in the midst of declining and failing will probably be relatively low on the P/BV scale yet, based on the research, we would be guided to buy these very same firms due to the historical pattern. It's likely that the gap between returns on low-priced (value) stocks and high-priced (growth) stocks has been systematically overestimated as a result of survivorship bias. Indeed, the investment industry has developed a number of growth and value indexes. However, in terms of defining for certain which strategy (growth or value) is superior, the actual evidence is mixed.

Sample selection bias extends to newer asset classes such as hedge funds, a heterogeneous group that is somewhat more removed from regulation, and where public disclosure of performance is much more discretionary compared to that of mutual funds or registered advisors of separately managed accounts. One suspects that hedge funds will disclose only the data that makes the fund look good (self-selection bias), compared to a more developed industry of mutual funds where the underperformers are still bound by certain disclosure requirements.

Look-Ahead Bias
Research is guilty of look-ahead bias if is makes use of information that was not actually available on a particular day, yet the researchers assume it was. Let's returning to the example of buying low price-to-book-value companies; the research may assume that we buy our low P/BV portfolio on Jan 1 of a given year, and then (compared to a high P/BV portfolio) hold it throughout the year. Unfortunately, while a firm's current stock price is immediately available, the book value of the firm is generally not available until months after the start of the year, when the firm files its official 10-K. To overcome this bias, one could construct P/BV ratios using current price divided by the previous year's book value, or (as is done by Russell's indexes) wait until midyear to rebalance after data is reported.

Time-Period Bias
This type of bias refers to an investment study that may appear to work over a specific time frame but may not last in future time periods. For example, any research done in 1999 or 2000 that covered a trailing five-year period may have touted the outperformance of high-risk growth strategies, while pointing to the mediocre results of more conservative approaches. When these same studies are conducted today for a trailing 10-year period, the conclusions might be quite different. Certain anomalies can persist for a period of several quarters or even years, but research should ideally be tested in a number of different business cycles and market environments in order to ensure that the conclusions aren't specific to one unique period or environment.

Calculating Confidence Intervals
Related Articles
  1. Personal Finance

    How To Choose A Financial Advisor

    Many advisors display similar skillsets that can make distinguishing between them difficult. The following guidelines can help you better understand their qualifications and services.
  2. Investing

    Asset Manager Ethics: Investment Process and Actions

    Managers, in developing their investment process, need to determine some “general rules” that make it meaningful. We offer six.
  3. Professionals

    Career Advice: Financial Analyst Vs. Investment Banker

    Read an in-depth comparison about working as a Financial Analyst vs. working as an Investment Banker, two highly prestigious business careers.
  4. Professionals

    Advisors: Which Certifications Are Essential?

    The right advisor credentials can make all the difference, but wading through some 100 certifications can be a challenge. Here's some help.
  5. Investing Basics

    Asset Manager Ethics: Valuation Is A Tricky Business

    Asset managers must accurately represent all of a clients assets in the client portfolio. This can be tricky for unique and hard-to-value assets.
  6. Personal Finance

    Top 10 Most Valuable Sports Teams in 2015

    Cleats, pads and profits: we take a look at the top 10 most valuable sports teams in the world.
  7. Professionals

    Chinese Slowdown Affects Iron Ore Market

    The Chinese economy's ongoing slowdown is having a major impact on iron ore demand.
  8. Personal Finance

    Invest in Costco? First Understand Its Balance Sheet

    A strong balance sheet sets a company apart and boosts investor confidence. How healthy is Costco based on an analysis of its balance sheets from the last two years?
  9. Investing Basics

    Brokers and RIAs: One and the Same?

    Brokers and registered investment advisors have some key differences. Here's what you need to know.
  10. Professionals

    DCF Vs. Comparables: Which One To Use

    DCF and Comparables models are widely used in equity valuation. We explain the pros and cons of each method.
  1. Personal Financial Advisor

    Professionals who help individuals manage their finances by providing ...
  2. CFA Institute

    Formerly known as the Association for Investment Management and ...
  3. Chartered Financial Analyst - CFA

    A professional designation given by the CFA Institute (formerly ...
  4. Security Analyst

    A financial professional who studies various industries and companies, ...
  1. What are the differences between a Chartered Financial Analyst (CFA) and a Certified ...

    The differences between a Chartered Financial Analyst (CFA) and a Certified Financial Planner (CFP) are many, but comes down ... Read Full Answer >>
  2. How do I become a Chartered Financial Analyst (CFA)?

    According to the CFA Institute, a person who holds a CFA charter is not a chartered financial analyst. The CFA Institute ... Read Full Answer >>
  3. What types of positions might a Chartered Financial Analyst (CFA) hold?

    The types of positions that a Chartered Financial Analyst (CFA) is likely to hold include any position that deals with large ... Read Full Answer >>
  4. Who benefits the most from prepaid expenses?

    Prepaid expenses benefit both businesses and individuals. Prepaid expenses are the types of expenses that are bought or paid ... Read Full Answer >>
  5. If I am looking to get an Investment Banking job. What education do employers prefer? ...

    If you are looking specifically for an investment banking position, an MBA may be marginally preferable over the CFA. The ... Read Full Answer >>
  6. Can I still pass the CFA Level I if I do poorly in the ethics section?

    You may still pass the Chartered Financial Analysis (CFA) Level I even if you fare poorly in the ethics section, but don't ... Read Full Answer >>
Hot Definitions
  1. Term Deposit

    A deposit held at a financial institution that has a fixed term, and guarantees return of principal.
  2. Zero-Sum Game

    A situation in which one person’s gain is equivalent to another’s loss, so that the net change in wealth or benefit is zero. ...
  3. Capitalization Rate

    The rate of return on a real estate investment property based on the income that the property is expected to generate.
  4. Gross Profit

    A company's total revenue (equivalent to total sales) minus the cost of goods sold. Gross profit is the profit a company ...
  5. Revenue

    The amount of money that a company actually receives during a specific period, including discounts and deductions for returned ...
  6. Normal Profit

    An economic condition occurring when the difference between a firm’s total revenue and total cost is equal to zero.
Trading Center
You are using adblocking software

Want access to all of Investopedia? Add us to your “whitelist”
so you'll never miss a feature!