Traders who are eager to try a trading idea in a live market often make the mistake of relying entirely on backtesting results to determine whether the system will be profitable. While backtesting can provide traders with valuable information, it is often misleading, and it is only one part of the evaluation process.
Out-of-sample testing and forward performance testing provide further confirmation regarding a system's effectiveness and can show a system's true colors before real cash is on the line. Good correlation between backtesting, out-of-sample and forward performance testing results is vital for determining the viability of a trading system. (We offer some tips on this process that can help refine your current trading strategies. To learn more, read: Backtesting: Interpreting the Past.)
Backtesting refers to applying a trading system to historical data to verify how a system would have performed during the specified time period. Many of today's trading platforms support backtesting. Traders can test ideas with a few keystrokes and gain insight into the effectiveness of an idea without risking funds in a trading account. Backtesting can evaluate simple ideas, such as how a moving average crossover would perform on historical data, or more complex systems with a variety of inputs and triggers.
As long as an idea can be quantified, it can be backtested. Some traders and investors may seek the expertise of a qualified programmer to develop the idea into a testable form. Typically, this involves a programmer coding the idea into the proprietary language hosted by the trading platform. The programmer can incorporate user-defined input variables that allow the trader to "tweak" the system. An example of this would be in the simple moving average crossover system noted above: The trader would be able to input (or change) the lengths of the two moving averages used in the system. The trader could backtest to determine which lengths of moving averages would have performed the best on the historical data.
Many trading platforms also allow for optimization studies. This entails entering a range for the specified input and letting the computer "do the math" to figure out what input would have performed the best. A multi-variable optimization can do the math for two or more variables to determine what combinations would have achieved the best outcome. For example, traders can tell the program which inputs they would like to add into their strategy; these would then be optimized to their ideal weights given the tested historical data.
Backtesting can be exciting in that an unprofitable system can often be magically transformed into a money-making machine with a few optimizations. Unfortunately, tweaking a system to achieve the greatest level of past profitability often leads to a system that will perform poorly in real trading. This over-optimization creates systems that look good on paper only.
Curve fitting is the use of optimization analytics to create the highest number of winning trades at the greatest profit on the historical data used in the testing period. Although it looks impressive in backtesting results, curve fitting leads to unreliable systems since the results are essentially custom-designed for that particular data and time period.
Backtesting and optimizing provide many benefits to a trader, but this is only part of the process when evaluating a potential trading system. A trader's next step is to apply the system to historical data that has not been used in the initial backtesting phase.
In-Sample Versus Out-of-Sample Data
When testing an idea on historical data, it is beneficial to reserve a time period of historical data for testing purposes. The initial historical data on which the idea is tested and optimized is referred to as the in-sample data. The data set that has been reserved is known as out-of-sample data. This setup is an important part of the evaluation process because it provides a way to test the idea on data that has not been a component in the optimization model. As a result, the idea will not have been influenced in any way by the out-of-sample data, and traders will be able to determine how well the system might perform on new data, i.e., in real-life trading.
Prior to initiating any backtesting or optimizing, traders can set aside a percentage of the historical data to be reserved for out-of-sample testing. One method is to divide the historical data into thirds and segregate one-third for use in the out-of-sample testing. Only the in-sample data should be used for the initial testing and any optimization. Figure 1 shows a time line in which one-third of the historical data is reserved for out-of-sample testing, and two-thirds are used for the in-sample testing. Although Figure 1 depicts the out-of-sample data in the beginning of the test, typical procedures would have the out-of-sample portion immediately preceding the forward performance.
Figure 1: A time line representing the relative length of in-sample and out-of-sample data used in the backtesting process.
Correlation refers to similarities between the performances and the overall trends of the two data sets. Correlation metrics can be used in evaluating strategy performance reports created during the testing period (a feature that most trading platforms provide). The stronger the correlation between the two, the better the probability that a system will perform well in forward performance testing and live trading.
Figure 2 illustrates two different systems that were tested and optimized on in-sample data, then applied to out-of-sample data. The chart on the left shows a system that was clearly curve-fit to work well on the in-sample data and completely failed on the out-of-sample data. The chart on the right shows a system that performed well on both in- and out-of-sample data. Once a trading system has been developed using in-sample data, it is ready to be applied to the out-of-sample data. Traders can evaluate and compare the performance results between the in-sample and out-of-sample data.
Figure 2: Two equity curves. The trade data before each yellow arrow represents in-sample testing. The trades generated between the yellow and red arrows indicate out-of-sample testing. The trades after the red arrows are from the forward performance testing phases.
If there is little correlation between the in-sample and out-of-sample testing, like the left chart in Figure 2, it is likely that the system has been over-optimized and will not perform well in live trading. If there is strong correlation in the performance, as seen in the right chart in Figure 2, the next phase of evaluation involves an additional type of out-of-sample testing known as forward performance testing. (For more reading about forecasting, refer to: Financial Forecasting: The Bayesian Method.)
Forward Performance Testing Basics
Forward performance testing, also known as paper trading, provides traders with another set of out-of-sample data on which to evaluate a system. Forward performance testing is a simulation of actual trading and involves following the system's logic in a live market. It is also called paper trading since all trades are executed on paper only; that is, trade entries and exits are documented along with any profit or loss for the system, but no real trades are executed.
An important aspect of forward performance testing is to follow the system's logic exactly; otherwise, it becomes difficult, if not impossible, to accurately evaluate this step of the process. Traders should be honest about any trade entries and exits and avoid behavior like cherry picking trades or not including a trade on paper rationalizing that "I would have never taken that trade." If the trade would have occurred following the system's logic, it should be documented and evaluated.
Many brokers offer a simulated trading account where trades can be placed and the corresponding profit and loss calculated. Using a simulated trading account can create a semi-realistic atmosphere on which to practice trading and further assess the system.
Figure 2 also shows the results for forward performance testing on two systems. Again, the system represented in the left chart fails to do well beyond the initial testing on in-sample data. The system shown in the right chart, however, continues to perform well through all phases, including the forward performance testing. A system that shows positive results with good correlation between in-sample, out-of-sample and forward performance testing is ready to be implemented in a live market. (See also: Pros and Cons of Paper Trading.)
The Bottom Line
Backtesting is a valuable tool available in most trading platforms. Dividing historical data into multiple sets to provide for in-sample and out-of-sample testing can provide traders with a practical and efficient means for evaluating a trading idea and system. Since most traders employ optimization techniques in backtesting, it is important to then evaluate the system on clean data to determine its viability.
Continuing the out-of-sample testing with forward performance testing provides another layer of safety before putting a system in the market risking real cash. Positive results and good correlation between in-sample and out-of-sample backtesting and forward performance testing increases the probability that a system will perform well in actual trading. (For a comprehensive overview on technical analysis, see: Basics of Technical Analysis.)