DEFINITION of 'Sum Of Squares'

Sum of Squares is a statistical technique used in regression analysis to determine the dispersion of data points. In a regression analysis, the goal is to determine how well a data series can be fitted to a function which might help to explain how the data series was generated. The sum of squares is used as a mathematical way to find the function which best fits (varies least) from the data.

The calculation for Sum of Squares is ∑(xi – x̄)2

Sum of squares is also known as Variation.

BREAKING DOWN 'Sum Of Squares'

The sum of squares is a measure of deviation from the mean. In statistics, the mean is the average of a set of numbers and is the most commonly used measure of central tendency. The arithmetic mean is simply calculated by summing up the values in the data set and dividing by the number of values. Let’s say the closing prices of Microsoft (MSFT) in the last five days were {74.01, 74.77, 73.94, 73.61, 73.40} in US dollars. The sum of the total prices is $369.73 and the mean or average price of the textbook is, therefore, $369.73/5 = $73.95.

But knowing the mean of a measurement set is not always enough. Sometimes, it is helpful to know how much variation there is in a set of measurements. How far apart the individual values are from the mean may give some insight into how fit the observations or values are to the regression model that is created. For example, if an analyst wanted to know whether the share price of MSFT moves in tandem with the price of Apple (AAPL), he can list out the set of observations for the process of both stocks for a certain period, say 1, 2, or 10 years and create a linear model with each of the observations or measurements recorded. If the relationship between both variables i.e. price of AAPL and price of MSFT, is not a straight line, then there are variations in the data set that need to be scrutinized. In statistics speak, if the line in the linear model created does not pass through all the measurements of value, then some of the variability that has been observed in the share prices is unexplained. The sum of squares is used to calculate whether a linear relationship exists between two variables, and any unexplained variability is referred to as the residual sum of squares.

The sum of squares is the sum of the square of variation, where variation is defined as the spread between each individual value and the mean. To determine the sum of squares, the distance between each data point and the line of best fit is squared and then summed up. The line of best fit will minimize this value.

The formula for sum of squares is ∑(xi – x̄)2

where ∑ = sum

             xi = each value in the set

             x̄ = mean

             xi – x̄ = deviation

             (xi – x̄)2 = square of deviation

Now you can see why the measurement is called the sum of squared deviations, or the sum of squares for short. Using our MSFT example above, the sum of squares can be calculated as:

SS = (74.01 - 73.95)2 + (74.77 - 73.95)2 + (73.94 - 73.95)2 + (73.61 - 73.95)2 + (73.40 - 73.95)2

SS = (0.06) 2 + (0.82)2 + (-0.01)2 + (-0.34)2 + (-0.55)2

SS = 1.0942

Adding the sum of the deviations alone without squaring will result in a number equal to or close to zero since the negative deviations will almost perfectly offset the positive deviations. To get a more realistic number, the sum of deviations must be squared. The sum of squares will always be a positive number because the square of any number, whether positive or negative, is always positive. 

A high sum of squares indicates that most of the values are farther away from the mean, and hence, there is large variability in the data. A low sum of squares refers to low variability in the set of observations. In the example above, 1.0942 shows that the variability in the stock price of MSFT in the last five days is very low and investors looking to invest in stocks characterized by price stability and low volatility may opt for MSFT. However, making an investment decision on what stock to purchase requires much more observations than the five listed here. An analyst may have to work with years of data to know with a higher certainty how high or low the variability of an asset is. As more data points are added to the set, the sum of squares becomes larger as the values will be more spread out.

The most widely used measurements of variation are the standard deviation and the variance. However, to calculate either of the two metrics, the sum of squares must first be calculated. The variance is the average of the sum of squares, i.e.the sum of squares divided by the number of observations. The standard deviation is the square root of the variance.

There are two methods of regression analysis which use the sum of squares: the linear least squares method and the non-linear least squares method. Least squares refers to the fact that the regression function minimizes the sum of the squares of the variance from the actual data points. In this way, it is possible to draw a function which statistically provides the best fit for the data. Note that a regression function can either be linear (a straight line) or non-linear (a curving line).

  1. Least Squares

    Least squares is a statistical method used to determine a line ...
  2. Standard Deviation

    The standard deviation is a statistic that measures the dispersion ...
  3. Volatility

    Volatility measures how much the price of a security, derivative, ...
  4. Line Of Best Fit

    A straight line drawn through the center of a group of data points ...
  5. Regression

    A statistical measure that attempts to determine the strength ...
  6. Chi Square Statistic

    A measurement of how expectations compare to results. The data ...
Related Articles
  1. Tech

    Square Surpasses $1B in Business Loans (SQ)

    Square reported Q3 software and data product revenue up 140% over last year, as Square Capital opens up and builds out with an array of new partnerships.
  2. Tech

    How Square Cash Works and Makes Money

    Square Inc. has rapidly grown into one of the largest payment processing companies in the United States, and here's how it makes money.
  3. Tech

    How Risky is Jack Dorsey’s Square? (SQ)

    Square is hedging against rising competitive risk and a possible economic downturn by expanding software services and pushing for digitalization.
  4. Tech

    Needham Initiates Square Inc. at Buy

    Analysts applaud the fintech leader's 'complete and cohesive' platform for clients of all sizes.
  5. Investing

    Square's Stock Is Facing Steeper Declines

    Despite falling by nearly 16% from its highs in 2018, shares of Square are still up by 40% on the year.
  6. Tech

    Lending Club, a Square Rival, Soars on Q3 (LC, SQ)

    Non-traditional online lending platforms post better-than-expected third quarter earnings, sending fintech stocks on the way to recovery.
  7. Investing

    4 Reasons Square’s Gains Can Multiply: Jefferies

    The fintech platform, up 180% YTD, could see a major uptake on multiple new growth drivers.
  8. Tech

    Fintech: Square Maintains Hold on Mobile Payments (SQ)

    With the announcement of faster chip card readers, Square's bread and butter remains in mobile payments.
  9. Insights

    Apple, Square Announce New Payments Partnership (AAPL, SQ)

    Combined, both services give consumers better, more secure options than they currently had with credit and debit cards, the executives said.
  10. Trading

    Square Stock Jumps to Fresh Highs, Remains Overbought

    Square shares moved sharply higher after the company posted strong third quarter results, but traders will be watching these levels.
  1. What is the difference between the expected return and the standard deviation of ...

    Learn about the expected return and standard deviation and the difference between the expected return and standard deviation ... Read Answer >>
  2. How do you calculate variance in Excel?

    To calculate statistical variance in Microsoft Excel, use the built-in Excel function VAR. Read Answer >>
  3. How can I measure portfolio variance?

    Find out more about portfolio variance, the formula to calculate portfolio variance and how to calculate the variance of ... Read Answer >>
Hot Definitions
  1. Intrinsic Value

    Intrinsic value is the perceived or calculated value of a company, including tangible and intangible factors, and may differ ...
  2. Current Assets

    Current assets is a balance sheet account that represents the value of all assets that can reasonably expected to be converted ...
  3. Volatility

    Volatility measures how much the price of a security, derivative, or index fluctuates.
  4. Money Market

    The money market is a segment of the financial market in which financial instruments with high liquidity and very short maturities ...
  5. Cost of Debt

    Cost of debt is the effective rate that a company pays on its current debt as part of its capital structure.
  6. Depreciation

    Depreciation is an accounting method of allocating the cost of a tangible asset over its useful life and is used to account ...
Trading Center