DEFINITION of 'Sum Of Squares'

Sum of Squares is a statistical technique used in regression analysis to determine the dispersion of data points. In a regression analysis, the goal is to determine how well a data series can be fitted to a function which might help to explain how the data series was generated. The sum of squares is used as a mathematical way to find the function which best fits (varies least) from the data.

The calculation for Sum of Squares is ∑(xi – x̄)2

Sum of squares is also known as Variation.

BREAKING DOWN 'Sum Of Squares'

The sum of squares is a measure of deviation from the mean. In statistics, the mean is the average of a set of numbers and is the most commonly used measure of central tendency. The arithmetic mean is simply calculated by summing up the values in the data set and dividing by the number of values. Let’s say the closing prices of Microsoft (MSFT) in the last five days were {74.01, 74.77, 73.94, 73.61, 73.40} in US dollars. The sum of the total prices is $369.73 and the mean or average price of the textbook is, therefore, $369.73/5 = $73.95.

But knowing the mean of a measurement set is not always enough. Sometimes, it is helpful to know how much variation there is in a set of measurements. How far apart the individual values are from the mean may give some insight into how fit the observations or values are to the regression model that is created. For example, if an analyst wanted to know whether the share price of MSFT moves in tandem with the price of Apple (AAPL), he can list out the set of observations for the process of both stocks for a certain period, say 1, 2, or 10 years and create a linear model with each of the observations or measurements recorded. If the relationship between both variables i.e. price of AAPL and price of MSFT, is not a straight line, then there are variations in the data set that need to be scrutinized. In statistics speak, if the line in the linear model created does not pass through all the measurements of value, then some of the variability that has been observed in the share prices is unexplained. The sum of squares is used to calculate whether a linear relationship exists between two variables, and any unexplained variability is referred to as the residual sum of squares.

The sum of squares is the sum of the square of variation, where variation is defined as the spread between each individual value and the mean. To determine the sum of squares, the distance between each data point and the line of best fit is squared and then summed up. The line of best fit will minimize this value.

The formula for sum of squares is ∑(xi – x̄)2

where ∑ = sum

             xi = each value in the set

             x̄ = mean

             xi – x̄ = deviation

             (xi – x̄)2 = square of deviation

Now you can see why the measurement is called the sum of squared deviations, or the sum of squares for short. Using our MSFT example above, the sum of squares can be calculated as:

SS = (74.01 - 73.95)2 + (74.77 - 73.95)2 + (73.94 - 73.95)2 + (73.61 - 73.95)2 + (73.40 - 73.95)2

SS = (0.06) 2 + (0.82)2 + (-0.01)2 + (-0.34)2 + (-0.55)2

SS = 1.0942

Adding the sum of the deviations alone without squaring will result in a number equal to or close to zero since the negative deviations will almost perfectly offset the positive deviations. To get a more realistic number, the sum of deviations must be squared. The sum of squares will always be a positive number because the square of any number, whether positive or negative, is always positive. 

A high sum of squares indicates that most of the values are farther away from the mean, and hence, there is large variability in the data. A low sum of squares refers to low variability in the set of observations. In the example above, 1.0942 shows that the variability in the stock price of MSFT in the last five days is very low and investors looking to invest in stocks characterized by price stability and low volatility may opt for MSFT. However, making an investment decision on what stock to purchase requires much more observations than the five listed here. An analyst may have to work with years of data to know with a higher certainty how high or low the variability of an asset is. As more data points are added to the set, the sum of squares becomes larger as the values will be more spread out.

The most widely used measurements of variation are the standard deviation and the variance. However, to calculate either of the two metrics, the sum of squares must first be calculated. The variance is the average of the sum of squares, i.e.the sum of squares divided by the number of observations. The standard deviation is the square root of the variance.

There are two methods of regression analysis which use the sum of squares: the linear least squares method and the non-linear least squares method. Least squares refers to the fact that the regression function minimizes the sum of the squares of the variance from the actual data points. In this way, it is possible to draw a function which statistically provides the best fit for the data. Note that a regression function can either be linear (a straight line) or non-linear (a curving line).

RELATED TERMS
  1. Residual Sum Of Squares - RSS

    A residual sum of squares is a statistical technique used to ...
  2. Square Position

    Square position is a situation where the market risk has been ...
  3. Nonlinear Regression

    Nonlinear regression is a form of regression analysis in which ...
  4. Sales Per Square Foot

    Sales per square foot is simply the total revenue divided by ...
  5. Variance

    The spread between numbers in a data set, measuring Variance ...
  6. Line Of Best Fit

    A straight line drawn through the center of a group of data points ...
Related Articles
  1. Tech

    Square Surpasses $1B in Business Loans (SQ)

    Square reported Q3 software and data product revenue up 140% over last year, as Square Capital opens up and builds out with an array of new partnerships.
  2. Investing

    Square Is Becoming a Bank

    The company, which sells hardware for payments and lends to small businesses, took the first step by filing an application.
  3. Tech

    Square

    Square helps small businesses accept credit cards, track sales and inventory and even obtain financing.
  4. Tech

    Fintech Leader Square Posts Q3 Earnings Beat (SQ)

    Jack Dorsey's payments and financial services platform celebrates one year on the public markets with an earnings beat, securing revenue growth of 32% YOY.
  5. Tech

    Lending Club, a Square Rival, Soars on Q3 (LC, SQ)

    Non-traditional online lending platforms post better-than-expected third quarter earnings, sending fintech stocks on the way to recovery.
  6. Investing

    4 Reasons Square’s Gains Can Multiply: Jefferies

    The fintech platform, up 180% YTD, could see a major uptake on multiple new growth drivers.
  7. Investing

    Square's Q4 Loss Was Much Narrower Than Expected

    Square (NYSE: SQ) saw its stock close 14% higher on the day the mobile payment and financial services specialist released its fourth-quarter and full 2016 results.  For the quarter, Square's ...
  8. Tech

    Fintech: Square Maintains Hold on Mobile Payments (SQ)

    With the announcement of faster chip card readers, Square's bread and butter remains in mobile payments.
  9. Investing

    Square’s 70% Run-Up Prompts Credit Suisse to Downgrade

    Credit Suisse thinks all the digital payments provider's upside is already baked into the stock
  10. Tech

    Analysts Weigh in on Electronic Payments Firms

    Pacific Crest lifts Square's price target and reiterates an overweight.
RELATED FAQS
  1. What is the difference between the expected return and the standard deviation of ...

    Learn about the expected return and standard deviation and the difference between the expected return and standard deviation ... Read Answer >>
  2. How do you calculate variance in Excel?

    To calculate statistical variance in Microsoft Excel, use the built-in Excel function VAR. Read Answer >>
  3. What is the difference between linear regression and multiple regression?

    Learn the difference between linear regression and multiple regression and how multiple regression encompasses not only linear ... Read Answer >>
  4. What does standard deviation measure in a portfolio?

    Dig deeper into the investment uses of and mathematical principles behind standard deviation as a measurement of portfolio ... Read Answer >>
  5. What is the difference between standard deviation and z score?

    Understand the basics of standard deviation and Z-score; learn how each is calculated and used in the assessment of market ... Read Answer >>
Hot Definitions
  1. Cryptocurrency

    A digital or virtual currency that uses cryptography for security. A cryptocurrency is difficult to counterfeit because of ...
  2. Financial Industry Regulatory Authority - FINRA

    A regulatory body created after the merger of the National Association of Securities Dealers and the New York Stock Exchange's ...
  3. Initial Public Offering - IPO

    The first sale of stock by a private company to the public. IPOs are often issued by companies seeking the capital to expand ...
  4. Cost of Goods Sold - COGS

    Cost of goods sold (COGS) is the direct costs attributable to the production of the goods sold in a company.
  5. Profit and Loss Statement (P&L)

    A financial statement that summarizes the revenues, costs and expenses incurred during a specified period of time, usually ...
  6. Monte Carlo Simulation

    Monte Carlo simulations are used to model the probability of different outcomes in a process that cannot easily be predicted ...
Trading Center