The term statistics is very broad. In some contexts it is used to refer to specific data. Statistics is also a branch of mathematics, a field of study - essentially the analysis tools and methods that are applied to data. Data, by itself, is nothing more than quantities and numbers. With statistics, data can be transformed into useful information and can be the basis for understanding and making intelligent comparisons and decisions.
- Descriptive Statistics - Descriptive statistics are tools used to summarize and consolidate large masses of numbers and data so that analysts can get their hands around it, understand it and use it. The learning outcomes in this section of the guide (i.e. the statistics section) are focused on descriptive statistics.
- Inferential Statistics - Inferential statistics are tools used to draw larger generalizations from observing a smaller portion of data. In basic terms, descriptive statistics intend to describe. Inferential statistics intend to draw inferences, the process of inferring. We will use inferential statistics in section D. Probability Concepts, later in this chapter.
Population Vs. Sample
A population refers to every member of a group, while a sample is a small subset of the population. Sampling is a method used when the task of observing the entire population is either impossible or impractical. Drawing a sample is intended to produce a smaller group with the same or similar characteristics as the population, which can then be used to learn more about the whole population.
Parameters and Sample Statistics
A parameter is the set of tools and measures used in descriptive statistics. Mean, range and variance are all commonly used parameters that summarize and describe the population. A parameter describes the total population. Determining the precise value of any parameter requires observing every single member of the population. Since this exercise can be impossible or impractical, we use sampling techniques, which draw a sample that (the analyst hopes) represents the population. Quantities taken from a sample to describe its characteristics (e.g. mean, range and variance) are termed sample statistics.
Population ® Parameter Sample ® Sample Statistic
Data is measured and assigned to specific points based on a chosen scale. A measurement scale can fall into one of four categories:
- Nominal - This is the weakest level as the only purpose is to categorize data but not rank it in any way. For example, in a database of mutual funds, we can use a nominal scale for assigning a number to identify fund style (e.g. 1 for large-cap value, 2 for large-cap growth, 3 for foreign blend, etc.). Nominal scales don't lend themselves to descriptive tools - in the mutual fund example, we would not report the average fund style as 5.6 with a standard deviation of 3.2. Such descriptions are meaningless for nominal scales.
- Ordinal - This category is considered stronger than nominal as the data is categorized according to some rank that helps describe rankings or differences between the data. Examples of ordinal scales include the mutual fund star rankings (Morningstar 1 through 5 stars), or assigning a fund a rating between 1 and 10 based on its five-year performance and its place within its category (e.g. 1 for the top 10%, 2 for funds between 10% and 20% and so forth). An ordinal scale doesn't always fully describe relative differences - in the example of ranking 1 to 10 by performance, there may be a wide performance gap between 1 and 2, but virtually nothing between 6, 7, and 8.
- Interval - This is a step stronger than the ordinal scale, as the intervals between data points are equal, and data can be added and subtracted together. Temperature is measured on interval scales (Celsius and Fahrenheit), as the difference in temperature between 25 and 30 is the same as the difference between 85 and 90. However, interval scales have no zero point - zero degrees Celsius doesn't indicate no temperature; it's simply the point at which water freezes. Without a zero point, ratios are meaningless - for example, nine degrees is not three times as hot as three degrees.
- Ratio - This category represents the strongest level of measurement, with all the features of interval scales plus the zero point, giving meaning to ratios on the scale. Most measurement scales used by financial analysts are ratios, including time (e.g. days-to-maturity for bonds), money (e.g. earnings per share for a set of companies) and rates of return expressed as a percentage.
A frequency distribution seeks to describe large data sets by doing four things:
(1) establishing a series of intervals as categories,
(2) assigning every data point in the population to one of the categories,
(3) counting the number of observations within each category and
(4) presenting the data with each assigned category, and the frequency of observations in each category.
Frequency distribution is one of the simplest methods employed to describe populations of data and can be used for all four measurement scales - indeed, it is often the best and only way to describe data measured on a nominal, ordinal or interval scale. Frequency distributions are sometimes used for equity index returns over a long history - e.g. the S&P 500 annual or quarterly returns grouped into a series of return intervals.
InvestingStatistics provide the means to analyze data and then summarize it into a numerical form.
InvestingDescriptive statistics is the term applied to meaningful data analysis.
InvestingSampling is a term used in statistics that describes methods of selecting a pre-defined representative number of data from a larger data population.
InvestingStandard error is a statistical term that measures the accuracy with which a sample represents a population.
InvestingSystematic sampling is similar to random sampling, but it uses a pattern for the selection of the sample.
InvestingA simple random sample is a subset of a statistical population in which each member of the subset has an equal probability of being chosen.
InvestingIn statistics, a representative sample accurately represents the make-up of various subgroups in an entire data pool.
InvestingCentral limit theorem is a fundamental concept in probability theory.
InvestingStratified random sampling is a technique best used with a sample population easily broken into distinct subgroups. Samples are then taken from each subgroup based on the ratio of the subgroup’s ...
InsightsMinimum efficient scale is the smallest amount of production a firm can achieve while still taking full advantage of economies of scale.