QuanTimer
(A quantitative approach to market timing)
A general description of the model
Ryan Dutton
August 1, 2014
1
Introduction
Market timing is an esoteric area of finance that has always been treated with a great degree of skepticism by both pundits and agnostics in the financial community. From the theory of efficient market hypothesis and random walk to countless empirical studies suggesting market timing to be nothing but a futile exercise there seem to be endless arguments for the detractors of this esoteric art. However, statistical analysis of the market suggests that there are reasons to believe that at least certain segments of the financial market can be timed with a fair degree of precision. Of course, it would be foolish to extend this idea to all financial markets and all financial instruments. We are only talking about a small segment of the financial market with discernible characteristics that can be captured statistically, and thus we are able to make a good forecast. We have made certain assumptions that this segment generally fulfills and most of the other segments do not. The particular characteristic of the market that we are concerned with here is called autoregression and it is a dominant theme in social sciences. The concept of autoregression will be explained in this document, as well as how it is implemented mathematically using difference equations. We will also get an outline of the methodology for forecasting, and see how all these dovetail nicely with each other to give us a significant edge for our investment strategy. QuanTimer model has been built with very high degrees of freedom, thus making it almost as reliable as one could expect a statistical model to be. QuanTimer was originally built with daily prices of S&P500 and Nasdaq indexes over the period of Jan 1996 to Dec 2010. Some basic preliminary inspection suggests that it can be easily extended to DAX and FTSE. This document will explain what the fundamental assumptions are for the model, why indexes such as S&P500, Nasdaq, DAX and FTSE satisfy them well, and why QuanTimer is not well-suited for individual stocks, commodities, currencies, bonds and even emerging market equity indexes.
2
Characteristics of financial market
Financial time series displays trends: Financial time series often display clear trends. (a) of figure 1 shows how CAC40 index trends up until the first quarter of 2008, and then trends down. The trend makes a time series data non-stationary, meaning the expected value changes over time. As we will discover later in the section 4, we often need to apply specific transformation to trending data in order to use them in econometric models.
(a): plot of CAC 40 index (19/3/2004-29/2/2008)
(b): log return of the series in (a)
6,500
.06
6,000
.04
5,500
.02
5,000
.00
4,500
-.02
4,000
-.04
3,500
-.06 -.08
3,000 250
500
750
250
1000
Figure 1: CAC 40 index and corresponding log returns
1
500
750
1000
The trend is the primary and the most essential component of QuanTimer. In QuanTimer we capture the trend of an index, and often remain invested for months without changing our position. In section 4 we will learn more about how we detect a trend and how signals are generated. Commodities, currencies, bonds, and many other financial instruments do not display nice trends consistently – the word to focus on is consistently – and we, therefore, refrain from applying QuanTimer to these instruments. There is a host of reasons why they don’t. We are not going into that discussion here. But it is enough to point out that having discernible trends with a high degree of consistency over time is one of the assumptions of our model. Commodities, currencies, bonds etc. violate that assumption. Shocks display high degree of persistence: When a stock index goes up, it usually goes up in a low volatile smooth fashion. But when it turns down, it usually starts with a sharp down move which is known as a shock. Once a financial system experiences a shock, it does not disappear in one time period; it dissipates slowly over several periods. Large changes tend to be followed by large changes, and small changes tend to be followed by small changes. This phenomenon is responsible for change in volatility, a word commonly used to describe the current variance of an index based on last few trading sessions. QuanTimer uses shocks, sudden change in price, and volatility to determine when a trend has ended. Our study shows that a trend always ends with rise in volatility, and volatility remains at a higher level when the market1 is not trending up. Shocks happen to be too high for individual stocks, currencies, commodities, emerging markets etc. and this is another reason why QuanTimer is not well-suited for these instruments. Volatility is not constant over time: Most financial time series do not have a constant mean, unless they are transformed to stationary data. And most do not have a constant variance which is also known as volatility. Financial time series exhibit periods of relative tranquility followed by periods of high volatility. It is quite evident from (b) of figure 1 that log returns of our original time series display clusters of high volatility periods. The variation in volatility serves as one of the important criteria for model selection. This is explained in section 4.5. High probability of extreme values: Financial time series often exhibit leptokurtosis, which means that the distribution of the returns is fat-tailed, as shown in figure 2. One reason for these outliers is that the conditional variance is not constant, and the outliers occur when the variance is large. This is the reason why we change our allocation depending on the volatility level of the index. When volatility is high, we are more likely to be wrong. And when we are wrong, the losses are much larger than usual. 240
Series: LOGRET Sample 1 1014 Observations 1013
200
160
120
80
40
Mean Median Maximum Minimum Std. Dev. Skewness Kurtosis
0.000278 0.000558 0.058335 -0.070774 0.009807 -0.547983 7.694029
Jarque-Bera Probability
980.7126 0.000000
0 -0.06
-0.04
-0.02
0.00
0.02
0.04
0.06
Figure 2. leptokurtic distribution of log-return 1
It is important to point out that we often use the words index, market and time series interchangeably.
2
3
Autoregression in social sciences
As mentioned in introduction, autoregression lies in the heart of QuanTimer model, and ironically it is a recursive process that happens to be a recurrent theme in social sciences. In this section the concept of autoregression will be introduced to our readers. Let us start with a simple example. Let us assume that we are currently in a booming economic cycle and consumers remain in a buoyant mood. They are enjoying dining out, luxury goods, expensive holiday and all other nice amenities of life. Now at some point of time economy starts to turn the corner, and cracks begin to appear in the economy. Central banks are going to begin their tightening cycle, thus fulfilling their inflation target mandate. Consumers are hearing about all these events as they unfold. However, they would refuse to switch from their steak and wine dinner to bread and water overnight. This simply does not happen. But why? The reason lies at the core of what is known as persistence of habit. For sake of argument let us assume that we are in 1970’s and everything is utterly depressed in our life, economically speaking. However, signs of prosperity are beginning to emerge slowly. But consumers would not suddenly go on a spending binge either. The point we are trying to illustrate is that habit changes gradually, and from one state to another over several periods. Well, we are generalizing the phenomenon. There are extraordinary stories of ‘prince to pauper’ and ‘pauper to prince’. But we can disregard them for now. If we collect large amount of time series data, we would indeed discover a startling fact of social sciences – consumption in one period is somehow linked to the consumption in the previous period. And this is what we call autoregressive behaviour. Economists have studied autoregression and have found examples of it in a variety of economic indicators. For example, inflation, investment, GDP, consumption – they all display autoregressive characteristics. And obviously, a sentiment gauge, if somehow quantified, would be autoregressive. Finance, being closely related to the economy, is an area where financial engineers and econometricians make use of autoregression for forecasting. In section 4.2 we will learn how autoregression is modelled mathematically and how it results in a difference equation. In section 4.6 we will see how signals are generated using these models, aka AR models.
4
QuanTimer methodology
4.1
Data transformation
We have already seen in the last section that a financial time series such as CAC 40 is not stationary. As a matter of fact this is true for all financial and economic time series. Most of the econometric models that we are going to work with require stationary data, as non-stationary data may lead to problems such as spurious regression. The non-stationary data, therefore, need to be transformed to yield stationary data. If we simply took the daily percentage change, it would generate stationary data with a constant mean and a constant unconditional variance. We also know that the series would be weakly dependent. This serves as a guarantee that our estimators are consistent, especially when we have such a large number of observations. QuanTimer almost fits the bill for an asymptotic model. If Pt is the index at time t, then the percentage return Rt at time t would be:
3
We can work with percentage return such as approximate it by log return suct that ≈ log(1 + Rt) = log(
. However, it is common in financial econometrics to
) = log(Pt) – log(Pt-1)
The advantage of log return is that it represents a continuously compounded return, and the multiperiod return is simply the sum of log return from all the periods. (b) of figure 1 shows the log return of our original time series of CAC 40. It is quite obvious by visual inspection that log returns are stationary. In our document whenever we use the term return, it means percentage return or log return, as they are essentially the same.
4.2
Autoregressive model
As mentioned in the previous section, an autoregressive (AR) process is one where the realization of a variable in one period is linked to its past. The past realizations in the context of an AR process are known as lags. So for example if consumption { } represents consumption at period t, and we know that it is linked to consumption in the last period, and we can write { } as (1) If represents the value at period t and if it is known to be linked to three of its lagged values, then we can form an equation as (2) However, this would be an inaccurate description of a process particularly in economics or finance. All processes in economics and finances are stochastic in nature, and an exact relationship can never be established. We normally rewrite an equation like (2) as , where is an unobservable stochastic element (3) From mathematical point of view an AR equation is essentially a difference equation which is the discrete version of a differential equation. Number of lags in an AR equation is the order of the difference equation. Generally in econometrics we do not solve the difference equation. But we use the model to compute the conditional expectation of a process given the past. To give our readers an idea about where this is leading us, let us give a concrete example. We could model the index return { } as an AR process such that depends on five lags. That means, given the return in last five sessions we can form some kind of expectation of the return in the next period. The question that arises naturally now is we all know that the financial market depends on so many variables, and how can we model the return as an AR model where our inputs are nothing but the past returns. We answer this question in the next section, and show that it is not only possible to do so, but also it is probably the best solution.
4.3
From structural equation to reduced-form equation (simplicity is beauty)
Variables in economics and finance tend to be endogenous meaning they are inter-related and this poses problems for estimating the equations properly. Therefore, it is often useful to collapse a system of difference equations into a single-equation model. We would illustrate this process here with the example from chapter 1 of Walter Enders’s Applied Econometric Time Series. Consider a stochastic version of Samuelson’s (1939) classic model: (4) (5) (6) 4
where , and represent real GDP, consumption and investment in time period t respectively. In this Keynesian model , and are endogenous variables. Equation (6) is a structural equation since it expresses the endogenous variable as being dependent on the current realization of another endogenous variable. A reduced-form equation is one expressing a variable in of its own lags, lags of other endogenous variables, current and past exogenous variables. To derive a reduced-form equation for investment substitute (5) into (6) and we obtain (7) Notice that this is not a univariate equation. Investment depends on the lagged values of GDP and consumption. We can easily obtain from (5) and obtain only in of lags of , as =>
(8)
The same way we can obtain a reduced-form equation of GDP by substituting (5) and (8) into (4): =>
(9)
Equation (9) is a univariate reduced-form equation. In contrast to a structural model that is multivariate in nature, a univariate reduced-form AR model assumes that all different structural models, though unspecified, are reduced to one AR model. And there lies another amazing fact. Even the variables that we fail to observe are also part of the AR model. Thus it is not only practical to use, but also it is often the best solution. An AR model is most useful for forecasting based on past realizations of a process. The major drawback of an AR model is that we can only answer what but cannot answer why. In other words we have some idea about what to expect from the process in the next period, but we have no idea about why it is so. An AR model is a-theoretical, implying its construction is not based on any fundamental characteristics of a variable. Instead we are empirically capturing relevant features of observed data that may have arisen from a variety of different structural models. (Brooks 2008). Financial market being one of the economic indicators – as a matter of fact it is a leading indicator – is known to be extremely autoregressive. And this is all we need from a financial time series, if we are interested only in what to expect next. Honestly speaking, if you are an investor and somehow you can predict with a decent degree of accuracy how the market will perform in the future, do you really care why it does so? A major benefit of modelling a financial time series using an AR model is that financial market is extremely sensitive to future expectation. Government data are published with a considerable lag, and most of the time all such information are already discounted in the price of the equity market. Using the price movement to predict the future price is thus one of the most efficient ways to construct a system of investment that can be back tested without a bias, and can be put to use with reliability. To summarize what we have discussed so far, we have established the fact that a financial time series such as Nasdaq index can be modelled as an AR model. Of course, we cannot do it by hand. There are many statistical software packages that can do it for us. Next, we also know that once a model has been created, it can be used to forecast. Well, so far so good. But the simple prediction cannot be used to make a decision as regard long or short positions in the market. We will discuss the technique of using the prediction to make a trading decision in section 4.6. But prior to that we need to look at one more important aspect of AR modelling - changing the model depending on the market condition. We discuss this in the next section.
5
4.5
Threshold Autoregressive model (one size does not fit all)
Many financial and economic time series undergo periods in which the behaviour of the series is dramatically different from other periods. This could happen for a number of reasons. When the market rises slowly and nicely, investors exhibit relatively calm disposition toward the market. On the contrary when the market is volatile, the trading pattern of the investors changes due to uncertainty and nervousness. By studying a financial time series over a long period it is sometimes possible to identify several isolated periods with similar characteristics. The change from one period to another is usually referred to as ‘regime change’ or ‘regime shift’. We have already seen that our intention is to model an index such as S&P500 using AR models. To be precise what we are using is ARDL (Autoregressive Distributed Lag) model. ARDL model adds to the basic AR model lags of other exogenous variables. We have identified such variables which have turned out to be significant in our model. After determining the basic structure of an appropriate ARDL model, we realize that we simply cannot apply one universal model to the entire time series, since with change in regime the autoregressive characteristics change. What we need is different models for different regimes. We would also need the boundary conditions to be identified for regime change. These boundary conditions are known as threshold, and the collection of various ARDL models marked by different thresholds is called a threshold autoregressive model. In QuanTimer we have identified three distinct regimes for the US market – i) up trend with low degree of volatility, ii) down trend with medium degree of volatility and iii) down trend with high degree of volatility. Our first challenge is to identify the change in regime with precision. The switching between regime 2 and 3 is determined by current volatility of the index. Since we are not predicting volatility – we are only interested in current volatility – there is no need to use GARCH model. We are using a variable that we consider a good measure of current volatility. To identify regime 1 we are using a trend filter along with the measure of volatility. After studying the market behaviour over a quarter of a century we have noticed that an up trend always ends with rise in volatility. To define the ARDL model for an individual regime we isolate the data for that regime, and carry out several regression analyses until we determine the right one. Once the ARDL models for all the regimes have been defined, the TAR model looks like:
(10)
The first part of every ARDL model in equation (10) is the autoregressive part, and the second part is the distributed lag part. At the end of a trading session if a regime change has been identified, we refer to the appropriate ARDL model for the new regime. We run the ARDL model and compute the estimated return for the next trading session. This is our forecast or the expected return. At this point it is still unclear to our readers what we do with this forecast. We address that in the next section and show how it helps us to issue a trading signal.
4.6
Generation of trading signals
Finally we have arrived at the section that our readers have been eagerly waiting for – how we generate the signals. The methodology we are applying here is one of the simplest and most fundamental techniques used in classical inferential statistics – determination of null hypothesis and rejecting it based on the value of a random variable. 6
We have already defined three regimes where the market can function – this is based on our model. In regime 1 the market is expected to go up and thus our null hypothesis is that the return should be a non-negative number. So if the expected return, as computed by the ARDL model, is a negative number large enough to be in the critical region, we can reject the null hypothesis. This would be equivalent to issuing a short signal, while the market remains in regime 1. It is just the opposite for regime 2 and 3. Our null hypothesis is that the return is a non-positive number. So if the expected return is a positive number and is large enough to be in the critical region, we reject the null hypothesis and issue a long signal. If we have an expected return for period t that is based on information available at period , we can write it as . In fact this is the value computed by the ARDL model at period , i.e. at the end of th trading session. Now, we can formulate our hypothesis testing as follows: Regime 1
:
,
:
Regime 2
:
,
:
Regime 3
:
,
:
.
Reject null if is negative and belongs to the critical region Reject null if is positive and belongs to the critical region Reject null if is positive and belongs to the critical region
Well, the scheme sounds fairly simple. The challenging part is determining the distribution of and setting the confidence interval. In stochastic modelling price follows log-normal distribution and return follows normal distribution. But there is a condition. We can expect return to follow normal distribution as long as we can estimate the variance and the mean. Based on the sample values we can most certainly compute the mean. However, as described in section 2.4, the variance is not constant in financial time series. But of course, we can use GARCH model to estimate the variance in the next period. Then if our ARDL model computes the expected return that is less than the mean value by 1.96 times the estimated standard deviation in the case of regime 1, we can reject the null hypothesis of a non-negative return with 95% confidence. And it is the opposite for regime 2 and 3. If the computed value of expected return is higher than the mean value by 1.96 times the estimated standard deviation, we can reject the null hypothesis of a non-positive return with 95% confidence. Great! It sounds like we have an excellent solution. It turns out that it is not after all that great, as demonstrated by our empirical analysis. So a better solution was devised and it is nonparametric estimation of the critical region. We make no assumption about the distribution. Of course, we know that the distribution is leptokurtic, when we don’t know the variance. But this is not a concern for us. We start with some approximate cut-off point for the critical region and move it slowly to adjust the number of long and short signals so that we achieve the maximum gain. By virtue of having a large number of observations and knowing the true value of the return and the expected value, this turns out to be the best solution. Please refer to appendix 4 for a graphical representation. Rare as it is, sometimes there is a conflict between S&P500 and Nasdaq as a consequence of market divergences. In all such instances we issue a cash signal. There are also instances where we have extreme values in some of the other variables we use in our model. In these instances we issue a cash signal as well.
7
5
Beyond the scope of this document
There is a myriad of issues that are outside the scope of this document. If you are econometrics savvy, you can rest assured that the all relevant issues related to time series such as serial correlation, heteroskedasticity, consistency of OLS estimators etc. have been addressed. A great care has been taken to make sure that QuanTimer meets all the technical requirements, i.e. it satisfies all the assumptions of time series. From technical standpoint QuanTimer would certainly qualify as a textbook example of time series econometrics.
6
Conclusion
Based on statistical analysis of the market and the results obtained by the QuanTimer model a 95% confidence interval could be set to infer that QuanTimer will outperform the market in the long run as long as US markets display trends. Of course, we cannot claim to get results in a certain period of time. However, as long as one invests enough time into the model, one can most certainly expect results. The author wishes the reader the best of luck in his or her future investment, i.e. the positive stochastic element that we all need to some degree to be successful.
8
References [1] Larsen and Marx. 2012. An Introduction to Mathematical Statistics and its Applications. Prentice Hall. [2] Enders, Walter. 2010. Applied Ecnometric Time Series. Wiley. [3] Brooks, Chirs. 2008. Introductory Ecnometrics for Finance. Cambridge University Press. [4] Tsay, Ruey S. 2005. Analysis of Financial Time Series. Wiley. [5] Gujarati, Damodar. 2004. Basic Econometrics. Mc-Graw Hill. [6] Wooldridge, Jeffrey M. 2012. Introductory Econometrics. Cenage Learning. [7] Griffith and Hill, 2011. Principles of Econometrics. Wiley. [8] Campbell, Lo, MacKinlay, 1997. The Econometrics of Financial Markets. Princeton Univ. Press. [9] Stefanica, Dan. 2008. A Primer for the Mathematics of Financial Engineering. FE Press. [10] Elaydi, Saber. 2005. An Introduction to Difference Equations. Springer. [11] Harville, David. 1997. Matrix Algebra from a Statistician’s perspective. Springer.
9
Appendix In section 5 we mentioned that QuanTimer satisfies all the necessary technical assumptions of time series. Here we are outlining results from one regression analysis and four post regression analyses to give our curious readers a glimpse at what the results may look like. The entire QuanTimer model involves over fifty tests from testing stationarity of all the variables to regression analyses as well as post regression analyses. There is no point in listing them all, as we have not really discussed any details of our models.
Appendix 1: Normality test of residuals from the regression of ARDL model in regime 2 with NASDAQ daily data for the period of Jan 1, 1996 to Dec 31, 2010. Column B is marked M to distinguish the rows as regime 2 (volatility level is medium, “M”). . sktest e if (B == "M") Skewness/Kurtosis tests for Normality ------- t -----Variable | Obs Pr(Skewness) Pr(Kurtosis) adj chi2(2) Prob>chi2 -------------+--------------------------------------------------------------e | 520 0.2671 0.4757 1.75 0.4179
Appendix 2: Kernel density plot of residuals from the regression of ARDL model in regime 2 with NASDAQ daily data for the period of Jan 1, 1996 to Dec 31, 2010.
0
.1
Density
.2
.3
Kernel density estimate
-4
-2
0 Residuals
2
Kernel density estimate Normal density kernel = epanechnikov, bandwidth = 0.3369
10
4
Appendix 3: Serial correlation test of residuals from the regression of ARDL model in regime 2 with NASDAQ daily data for the period of Jan 1, 1996 to Dec 31, 2010. Column B is marked M to distinguish the rows as regime 2 (volatility level is medium, “M”). Number of observation is less than 520, as we discount the first period in every instance of regime 2. . regress e eLag if B == "M" Source | SS df MS -------------+-----------------------------Model | .968012556 1 .968012556 Residual | 814.350326 483 1.68602552 -------------+-----------------------------Total | 815.318339 484 1.68454202
Number of obs F( 1, 483) Prob > F R-squared Adj R-squared Root MSE
= 485 = 0.57 = 0.4490 = 0.0012 = -0.0009 = 1.2985
-----------------------------------------------------------------------------e | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------eLag | .0370204 .0488576 0.76 0.449 -.0589794 .1330202 _cons | .0692754 .0593663 1.17 0.244 -.0473727 .1859235 ------------------------------------------------------------------------------
Appendix 4: Scatter plot of expected return (fitted value) vs. true return (G) from the regression of ARDL model in regime 2 with NASDAQ daily data for the period of Jan 1, 1996 to Dec 31, 2010. When expected return falls in the critical region we reject the null hypothesis of being non-positive, and issue a long signal. Otherwise, we issue a short signal. As evident from the scatter plot, most of the points show non-positive returns, and most of the points that lead to rejection of the null show positive returns.
11