## What Is Correlation?

Correlation, in the finance and investment industries, is a statistic that measures the degree to which two securities move in relation to each other. Correlations are used in advanced portfolio management, computed as the correlation coefficient, which has a value that must fall between -1.0 and +1.0.

### Key Takeaways

- Correlation is a statistic that measures the degree to which two variables move in relation to each other.
- In finance, the correlation can measure the movement of a stock with that of a benchmark index, such as the S&P 500.
- Correlation is closely tied to diversification, the concept that certain types of risk can be mitigated by investing in assets that are not correlated.
- Correlation measures association, but doesn't show if x causes y or vice versa—or if the association is caused by a third factor.
- Correlation may be easiest to identify using a scatterplot, especially if the variables have a non-linear yet still strong correlation.

#### Correlation

## What Correlation Can Tell You

Correlation shows the strength of a relationship between two variables and is expressed numerically by the correlation coefficient. The correlation coefficient's values range between -1.0 and 1.0.

A perfect positive correlation means that the correlation coefficient is exactly 1. This implies that as one security moves, either up or down, the other security moves in lockstep, in the same direction. A perfect negative correlation means that two assets move in opposite directions, while a zero correlation implies no linear relationship at all.

For example, large-cap mutual funds generally have a high positive correlation to the Standard and Poor's (S&P) 500 Index or nearly one. Small-cap stocks tend to have a positive correlation to the S&P, but it's not as high or approximately 0.8.

However, put option prices and their underlying stock prices will tend to have a negative correlation. A put option gives the owner the right but not the obligation to sell a specific amount of an underlying security at a pre-determined price within a specified time frame.

Put option contracts become more profitable when the underlying stock price decreases. In other words, as the stock price increases, the put option prices go down, which is a direct and high-magnitude negative correlation.

## How to Calculate Correlation

There are several methods of calculating correlation. The most common method, the Pearson product-moment correlation, is discussed further in this article. The Pearson product-moment correlation measures the linear relationship between two variables. It can be used for any data set that has a finite covariance matrix. Here are the steps to calculate correlation.

- Gather data for your "x-variable" and "y variable.
- Find the mean for the x-variable and find the mean for the y-variable.
- Subtract the mean of the x-variable from each value of the x-variable. Repeat this step for the y-variable.
- Multiply each difference between the x-variable mean and x-variable value by the corresponding difference related to the y-variable.
- Square each of these differences and add the results.
- Determine the square root of the value obtained in Step 5.
- Divide the value in Step 4 by the value obtained in Step 6.

To avoid the complex manual calculation, consider using the CORREL function in Excel.

### Formula for Correlation

Using the Pearson product-moment correlation method, the following formula can be used to find the correlation coefficient, r:

$\begin{aligned}&r = \frac { n \times ( \sum (X, Y) - ( \sum (X) \times \sum (Y) ) ) }{ \sqrt { ( n \times \sum (X ^ 2) - \sum (X) ^ 2 ) \times ( n \times \sum( Y ^ 2 ) - \sum (Y) ^ 2 ) } } \\&\textbf{where:}\\&r=\text{Correlation coefficient}\\&n=\text{Number of observations}\end{aligned}$

## Example of Correlation

Investment managers, traders, and analysts find it very important to calculate correlation because the risk reduction benefits of diversification rely on this statistic. Financial spreadsheets and software can calculate the value of correlation quickly.

As a hypothetical example, assume that an analyst needs to calculate the correlation for the following two data sets:

**X:** (41, 19, 23, 40, 55, 57, 33)

**Y: **(94, 60, 74, 71, 82, 76, 61)

There are three steps involved in finding the correlation. The first is to add up all the X values to find SUM(X), add up all the Y values to fund SUM(Y) and multiply each X value with its corresponding Y value and sum them to find SUM(X,Y):

SUM(X) = (41 + 19 + 23 + 40 + 55 + 57 + 33) = 268

SUM(Y) = (94 + 60 + 74 + 71 + 82 + 76 + 61) = 518

SUM(X,Y) = (41 x 94) + (19 x 60) + (23 x 74) + ... (33 x 61) = 20,391

The next step is to take each X value, square it, and sum up all these values to find SUM(x^2). The same must be done for the Y values:

SUM(X^2) = (41^2) + (19^2) + (23^2) + ... (33^2) = 11,534

SUM(Y^2) = (94^2) + (60^2) + (74^2) + ... (61^2) = 39,174

Noting that there are seven observations, n, the following formula can be used to find the correlation coefficient, r:

$\begin{aligned}&r = \frac { n \times ( \sum (X, Y) - ( \sum (X) \times \sum (Y) ) ) }{ \sqrt { ( n \times \sum (X ^ 2) - \sum (X) ^ 2 ) \times ( n \times \sum( Y ^ 2 ) - \sum (Y) ^ 2 ) } } \\&\textbf{where:}\\&r=\text{Correlation coefficient}\\&n=\text{Number of observations}\end{aligned}$

In this example, the correlation would be:

r = (7 x 20,391 - (268 x 518) / SquareRoot((7 x 11,534 - 268^2) x (7 x 39,174 - 518^2)) = 3,913 / 7,248.4 = 0.54

## Correlation and Portfolio Diversification

In investing, correlation is most important in relation to a diversified portfolio. Investors who wish to mitigate risk can do so by investing in non-correlated assets. For example, consider an investor who owns airline stock. If the airline industry is found to have a low correlation to the social media industry, the investor may choose to invest in a social media stock understanding that an negative impact to one industry may not impact the other.

This is often the approach when considering investing across asset classes. Stocks, bonds, precious metals, real estate, cryptocurrency, commodities, and other types of investments each have different relationships to each other. While some may be heavily correlated, others may act as a hedge to diversify risk if they are not correlated.

Risk that can be diversified away is called unsystematic risk. This type of risk is specific to a company, industry, or asset class. Investing in different assets can reduce your portfolio's correlation and reduce your exposure to unsystematic risk.

## Special Considerations

Correlation is often dictated and related to other statistical considerations. It is common to see correlation cited when statistics is used to analyze variables.

### P-Value

In statistics, a p-value is used to indicate whether the findings are statistically significant. It is possible to determine that two variables are correlated, but there may not be enough supporting evidence to state this as a strong claim. A high p-value indicates there is enough evidence to meaningfully conclude that the population correlation coefficient is different from zero.

### Scatterplots

The easiest way to visualize whether two variables are correlated is to graphically depict them using a scatterplot. Each point on a scatterplot represents one sample item. The x-axis of the scatterplot represents one of the variables being tested, while the y-axis of the scatter plot represents the other.

The correlation coefficient of the two variables is depicted graphically often as a linear line mapped to show the relationship of the two variables. If the two variables are positively correlated, an increasing linear line may be drawn on the scatterplot. If two variables are negatively correlated, a decreasing linear line may be draw. The stronger the relationship of the data points, the closer each data point will be to this line.

Scatterplots may be more useful when analyzing more complex data that might have changing relationships. For example, two variables may be positively correlated to a certain point, then their relationship becomes negatively correlated. This non-linear relationship may be more difficult to identify using formulas but can be easier to spot when graphed on a scatterplot.

Last, scatterplots can easily depict correlation when they incorporate density shading. A density shade or density ellipse is a shaded area on a scatterplot that visually shows the densest region of data points on a scatterplot. The density ellipses will often mirror the direction of a linear correlation line if variables are related. Otherwise, density ellipses that are more circular with no defined direction indicate lower correlation.

### Causation

Another inherent difficulty in statistics is determining whether relationships between two variables are caused by those variables. Consider the following statement:

"Most basketball players are tall. Therefore if you play basketball, you will become tall."

It's clear that the statement above is not true. Individuals who are tall and understand this advantage may gravitate to basketball because their natural physical abilities best suit them for the sport. However, because height and activity in basketball may be positively correlated, statisticians and data scientists must be aware that a strong relationship between two variables may or may be caused due to any one of the variables.

## Limitations of Correlation

Like other aspects of statistical analysis, correlation can be misinterpreted. Small sample sizes may yield unreliable results, even if it appears as though correlation between two variables is strong. Alternatively, a small sample size may yield uncorrelated findings when the two variables are in fact linked.

Correlation is often skewed when an outlier is present. Correlation only shows how one variable is connected to another and may not clearly identify how a single instance or outcome can impact the correlation coefficient.

Correlation may also be misinterpreted if the relationship between two variables is nonlinear. It is much easier to identify two variables with a positive or negative correlation. However, two variables may still be correlated with a more complex relationship.

## What Is Correlation?

Correlation is a statistical term describing the degree to which two variables move in coordination with one another. If the two variables move in the same direction, then those variables are said to have a positive correlation. If they move in opposite directions, then they have a negative correlation.

## Why Are Correlations Important in Finance?

Correlations play an important role in finance because they are used to forecast future trends and to manage the risks within a portfolio. These days, the correlations between assets can be easily calculated using various software programs and online services. Correlations, along with other statistical concepts, play an important role in the creation and pricing of derivatives and other complex financial instruments.

## What Is an Example of How Correlation Is Used?

Correlation is a widely-used concept in modern finance. For example, a trader might use historical correlations to predict whether a company’s shares will rise or fall in response to a change in interest rates or commodity prices. Similarly, a portfolio manager might aim to reduce their risk by ensuring that the individual assets within their portfolio are not overly correlated with one another.

## Is High Correlation Better?

Investors may have a preference on the level of correlation within their portfolio. In general, most investors will prefer to have a lower correlation as this mitigates risk in their portfolios of different assets or securities being impacted by similar market conditions. However, risk-seeking investors or investors wanting to put their money into a very specific type of sector or company may be willing to have higher correlation within their portfolio in exchange for greater potential returns.