What Is the Correlation Coefficient?
The correlation coefficient is a statistical measure of the strength of a linear relationship between two variables. Its values can range from -1 to 1. A correlation coefficient of -1 describes a perfect negative, or inverse, correlation, with values in one series rising as those in the other decline, and vice versa. A coefficient of 1 shows a perfect positive correlation, or a direct relationship. A correlation coefficient of 0 means there is no linear relationship.
Correlation coefficients are used in science and in finance to assess the degree of association between two variables, factors, or data sets. For example, since high oil prices are favorable for crude producers, one might assume the correlation between oil prices and forward returns on oil stocks is strongly positive. Calculating the correlation coefficient for these variables based on market data reveals a moderate and inconsistent correlation over lengthy periods.
- Correlation coefficients are used to assess the strength of associations between data variables.
- The most common, called a Pearson correlation coefficient, measures the strength and the direction of a linear relationship between two variables.
- Values always range from -1 for a perfectly inverse, or negative, relationship to 1 for a perfectly positive correlation. Values at, or close to, zero indicate no linear relationship or a very weak correlation.
- The coefficient values required to signal a meaningful association depend on the application. The statistical significance of a correlation can be calculated from the correlation coefficient and the number of data points in the sample, assuming a normal population distribution.
Understanding the Correlation Coefficient
Different types of correlation coefficients are used to assess correlation based on the properties of the compared data. By far the most common is the Pearson coefficient, or Pearson's r, which measures the strength and direction of a linear relationship between two variables. The Pearson coefficient cannot assess nonlinear associations between variables and cannot differentiate between dependent and independent variables.
The Pearson coefficient uses a mathematical statistics formula to measure how closely the data points combining the two variables (with the values of one data series plotted on the x-axis and the corresponding values of the other series on the y-axis) approximate the line of best fit. The line of best fit can be determined through regression analysis.
The further the coefficient is from zero, whether it is positive or negative, the better the fit and the greater the correlation. The values of -1 (for a negative correlation) and 1 (for a positive one) describe perfect fits in which all data points align in a straight line, indicating that the variables are perfectly correlated. In other words, the relationship is so predictable that the value of one variable can be determined from the matched value of the other. The closer the correlation coefficient is to zero the weaker the correlation, until at zero no linear relationship exists at all.
Assessments of correlation strength based on the correlation coefficient value vary by application. In physics and chemistry, a correlation coefficient should be lower than -0.9 or higher than 0.9 for the correlation to be considered meaningful, while in social sciences the threshold could be as high as -0.5 and as low as 0.5.
For correlation coefficients derived from sampling, the determination of statistical significance depends on the p-value, which is calculated from the data sample's size as well as the value of the coefficient.
Correlation Coefficient Equation
To calculate the Pearson correlation, start by determining each variable's standard deviation as well as the covariance between them. The correlation coefficient is covariance divided by the product of the two variables' standard deviations.
ρxy=σxσyCov(x,y)where:ρxy=Pearson product-moment correlation coefficientCov(x,y)=covariance of variables x and yσx=standard deviation of xσy=standard deviation of y
Standard deviation is a measure of the dispersion of data from its average. Covariance shows whether the two variables tend to move in the same direction, while the correlation coefficient measures the strength of that relationship on a normalized scale, from -1 to 1.
The formula above can be elaborated as
r=(n×∑(X2)−∑(X)2)×(n×∑(Y2)−∑(Y)2)n×(∑(X,Y)−(∑(X)×∑(Y)))where:r=Correlation coefficientn=Number of observations
Correlation Statistics and Investing
The correlation coefficient is particularly helpful in assessing and managing investment risks. For example, modern portfolio theory suggests diversification can reduce the volatility of a portfolio's returns, curbing risk. The correlation coefficient between historical returns can indicate whether adding an investment to a portfolio will improve its diversification.
Correlation calculations are also a staple of factor investing, a strategy for constructing a portfolio based on factors associated with excess returns. Meanwhile, quantitative traders use historical correlations and correlation coefficients to anticipate near-term changes in securities prices.
Limitations of the Pearson Correlation Coefficient
Correlation does not imply causation, as the saying goes, and the Pearson coefficient cannot determine whether one of the correlated variables is dependent on the other.
Nor does the correlation coefficient show what proportion of the variation in the dependent variable is attributable to the independent variable. That's shown by the coefficient of determination, also known as R-squared, which is simply the correlation coefficient squared.
The correlation coefficient does not describe the slope of the line of best fit; the slope can be determined with the least squares method in regression analysis.
The Pearson correlation coefficient can't be used to assess nonlinear associations or those arising from sampled data not subject to a normal distribution. It can also be distorted by outliers—data points far outside the scatterplot of a distribution. Those relationships can be analyzed using nonparametric methods, such as Spearman's correlation coefficient, the Kendall rank correlation coefficient, or a polychoric correlation coefficient.
Finding Correlation Coefficients in Excel
The simplest way to calculate correlation in Excel is to input two data series in adjacent columns and use the built-in correlation formula:
If you want to create a correlation matrix across a range of data sets, Excel has a Data Analysis plugin on the Data tab, under Analyze.
Select the table of returns. In this case, our columns are titled, so we want to check the box "Labels in first row," so Excel knows to treat these as titles. Then you can choose to output on the same sheet or on a new sheet.
Hitting enter will produce the correlation matrix. You can add some text and conditional formatting to clean up the result.
What Is a Correlation Coefficient?
The correlation coefficient describes how one variable moves in relation to another. A positive correlation indicates that the two move in the same direction, with a value of 1 denoting a perfect positive correlation. A value of -1 shows a perfect negative, or inverse, correlation, while zero means no linear correlation exists.
How Do You Calculate the Correlation Coefficient?
The correlation coefficient is calculated by determining the covariance of the variables and dividing that number by the product of those variables’ standard deviations.
How Is the Correlation Coefficient Used in Investing?
Correlation coefficients play a key role in portfolio risk assessments and quantitative trading strategies. For example, some portfolio managers will monitor the correlation coefficients of their holdings to limit a portfolio's volatility and risk.