What Is Correlation?
Correlation measures the linear relationship between two variables. By measuring and relating the variance of each variable, correlation gives an indication of the strength of the relationship. Or to put it another way, correlation answers the question: How much does variable A (the independent variable) explain variable B (the dependent variable)?
- Correlation is the statistical linear correspondence of variation between two variables.
- In finance, correlation is used in several facets of analysis including the calculation of portfolio standard deviation.
- Computing correlation can be time-consuming, but software like Excel makes it easy to calculate.
The Formula for Correlation
Correlation combines several important and related statistical concepts, namely, variance and standard deviation. Variance is the dispersion of a variable around the mean, and standard deviation is the square root of variance.
The formula is:
Since correlation wants to assess the linear relationship of two variables, what's really required is to see what amount of covariance those two variables have, and to what extent that covariance is reflected by the standard deviations of each variable individually.
Common Mistakes With Correlation
The single most common mistake is assuming a correlation approaching +/- 1 is statistically significant. A reading approaching +/- 1 definitely increases the chances of actual statistical significance, but without further testing, it's impossible to know. The statistical testing of a correlation can get complicated for a number of reasons; it's not at all straightforward. A critical assumption of correlation is that the variables are independent and that the relationship between them is linear. In theory, you would test these claims to determine if a correlation calculation is appropriate.
Remember, correlation between two variables does NOT imply that A caused B or vice versa.
The second most common mistake is forgetting to normalize the data into a common unit. If calculating a correlation on two betas, then the units are already normalized: beta itself is the unit. However, if you want to correlate stocks, it's critical you normalize them into percent return, and not share price changes. This happens all too frequently, even among investment professionals.
For stock price correlation, you are essentially asking two questions: What is the return over a certain number of periods, and how does that return correlate to another security's return over the same period? This is also why correlating stock prices is difficult: Two securities might have a high correlation if the return is daily percent changes over the past 52 weeks, but a low correlation if the return is monthly changes over the past 52 weeks. Which one is "better"? There really is no perfect answer, and it depends on the purpose of the test.
Finding Correlation in Excel
There are several methods to calculate correlation in Excel. The simplest is to get two data sets side-by-side and use the built-in correlation formula:
This is a convenient way to calculate a correlation between just two data sets. But what if you want to create a correlation matrix across a range of data sets? To do this, you need to use Excel's Data Analysis plugin. The plugin can be found in the Data tab, under Analyze.
Select the table of returns. In this case, our columns are titled, so we want to check the box "Labels in first row," so Excel knows to treat these as titles. Then you can choose to output on the same sheet or on a new sheet.
Once you hit enter, the data is automatically made. You can add some text and conditional formatting to clean up the result.