Sample Selection Bias

Dictionary Says

Definition of 'Sample Selection Bias'

A type of bias caused by choosing non-random data for statistical analysis. The bias exists due to a flaw in the sample selection process, where a subset of the data is systematically excluded due to a particular attribute. The exclusion of the subset can influence the statistical significance of the test, or produce distorted results.
Investopedia Says

Investopedia explains 'Sample Selection Bias'

Survivorship bias is a common type of sample selection bias. For example, when back-testing an investment strategy on a large group of stocks, it may be convenient to look for securities that have data for the entire sample period. If we were going to test the strategy against 15 years worth of stock data, we might be inclined to look for stocks that have complete information for the entire 15-year period. However, eliminating a stock that stopped trading, or shortly left the market, would input a bias in our data sample. Since we are only including stocks that lasted the 15-year period, our final results would be flawed, as these performed well enough to survive the market.

Related Definitions

  • Survivorship Bias

    The tendency for mutual funds with poor performance to be dropped by mutual fund companies, generally because of poor results or low asset accumulation. This phenomenon, which is ...
    Read More »
  • Attribute Bias

    The tendency of stocks selected by a quantitative technique or model to have similar fundamental characteristics, such as high yields and low earnings valuations. Most investing models ...
    Read More »
  • Look-Ahead Bias

    Bias created by the use of information or data in a study or simulation that would not have been known or available during the period being analyzed. This will usually lead to inaccurate ...
    Read More »
    • Sampling Error

      A statistical error to which an analyst exposes a model simply because he or she is working with sample data rather than population or census data. Using sample data presents the risk ...
      Read More »
    • Non-Sampling Error

      A statistical error caused by human error to which a specific statistical analysis is exposed. These errors can include, but are not limited to, data entry errors, biased questions in a ...
      Read More »
    • James J. Heckman

      An American economist who won the 2000 Nobel Memorial Prize in Economics, along with Daniel McFadden, for his Heckman correction, a statistical method of correcting for self-selection ...
      Read More »
    • Population

      The entire pool from which a statistical sample is drawn. The information obtained from the sample allows statisticians to develop hypotheses about the larger population. Researchers ...
      Read More »
    • Stratified Random Sampling

      A method of sampling that involves the division of a population into smaller groups known as strata. In stratified random sampling, the strata are formed based on members' shared ...
      Read More »
    • Sampling Distribution

      A probability distribution of a statistic obtained through a large number of samples drawn from a specific population. The sampling distribution of a given population is the distribution ...
      Read More »
    • Reverse Survivorship Bias

      The tendency for low performers to remain in the game, while high performers are inadvertantly dropped from the running. This bias can be applied to a variety of vehicles ranging from ...
      Read More »

Articles Of Interest

Partner Links