Welcome back everyone, finally I have found a little time to get around to finishing off this short series on **Python Backtesting Mean Reversion** strategy on ETF pairs.

In the last post we got as far as creating the spread series between the two ETF price series in question (by first running a linear regression to find the hedge ratio) and ran an Augmented Dickey Fuller test, along with calculating the half-life of that spread series to see whether it was a decent candidate for a tradable strategy pair.

Now we have to write the part of the script that will calculate the “normalised” Z-Score of the spread series, and set up a “bollinger-band” style entry and exit system whereby short trades are entered into if the normalised Z-Score rises above 2, and exits when it falls below 0, and vice-versa for long trades (i.e. the Z-Score has to fall below -2 to enter and exit when it rises above 0).

Now we are actually going to calculate the Z-Score by using a rolling window for the mean and standard deviation, set as the half-life previously calculated in the previous blog post. This saves us from either committing a look-forward bias by using the mean across the whole period, or from choosing an arbitrary look-back window that would need to be optimised and could lead to data-mining bias.

We calculate and plot the normalised Z-Score as follows:

meanSpread = df1.spread.rolling(window=halflife).mean() stdSpread = df1.spread.rolling(window=halflife).std() df1['zScore'] = (df1.spread-meanSpread)/stdSpread df1['zScore'].plot()

Ok so this next part can be a little fiddly; what we need to end up with is a column in our dataframe that signifies whether we should currently be long, short or flat in terms of position. We can accomplish this by following a couple of steps.

Firstly we will set up a column called “num units long” which will signify when we need to be long by filling those rows with a 1, and fill the remaining rows with a 0 to signify no long position.

We will then do exactly the same for short positions by setting up a columns called “num units short” and fill those rows where we should be short with a -1 and those where there is no short position with a 0.

This is achieved as follows (we also set our absolute entry and exit Z-Scores as 2 and 0 respectively):

entryZscore = 2 exitZscore = 0 #set up num units long df1['long entry'] = ((df1.zScore < - entryZscore) & ( df1.zScore.shift(1) > - entryZscore)) df1['long exit'] = ((df1.zScore > - exitZscore) & (df1.zScore.shift(1) < - exitZscore)) df1['num units long'] = np.nan df1.loc[df1['long entry'],'num units long'] = 1 df1.loc[df1['long exit'],'num units long'] = 0 df1['num units long'][0] = 0 df1['num units long'] = df1['num units long'].fillna(method='pad') #set up num units short df1['short entry'] = ((df1.zScore > entryZscore) & ( df1.zScore.shift(1) < entryZscore)) df1['short exit'] = ((df1.zScore < exitZscore) & (df1.zScore.shift(1) > exitZscore)) df1.loc[df1['short entry'],'num units short'] = -1 df1.loc[df1['short exit'],'num units short'] = 0 df1['num units short'][0] = 0 df1['num units short'] = df1['num units short'].fillna(method='pad')

Now we can just create another column, which sums the “num units long” and “num units short” to get us our “numUnits” – the overall position that our portfolio should be in at that time; either long (1), short (-1) or flat (0).

We will also generate a column containing the percentage change of the spread series, and then generate a portfolio return column by multiplying the percentage change of the spread series by the current holding of the portfolio (long, short or flat).

The daily portfolio returns are then cumulatively added to generate an equity curve, held in “cum rets”.

df1['numUnits'] = df1['num units long'] + df1['num units short'] df1['spread pct ch'] = (df1['spread'] - df1['spread'].shift(1)) / ((df1['x'] * abs(df1['hr'])) + df1['y']) df1['port rets'] = df1['spread pct ch'] * df1['numUnits'].shift(1) df1['cum rets'] = df1['port rets'].cumsum() df1['cum rets'] = df1['cum rets'] + 1

We can now plot the portfolio equity curve as follows:

plt.plot(df1['cum rets']) plt.xlabel(i[1]) plt.ylabel(i[0]) plt.show()

Now all we have left to do is calculate the Sharpe Ratio and the Compound Annual Growth Rate (CAGR):

sharpe = ((df1['port rets'].mean() / df1['port rets'].std()) * sqrt(252)) start_val = 1 end_val = df1['cum rets'].iat[-1] start_date = df1.iloc[0].name end_date = df1.iloc[-1].name days = (end_date - start_date).days CAGR = round(((float(end_val) / float(start_val)) ** (252.0/days)) - 1,4) print "CAGR = {}%".format(CAGR*100) print "Sharpe Ratio = {}".format(round(sharpe,2))

CAGR = 0.33% Sharpe Ratio = 0.09

Wow ok so that result doesn’t look too great at all in terms of returns – hardly better than flat and when transaction fees and trading costs are taken into account that’s going to be a negative overall return.

Now I guess all that is left is to test the strategy over different ETF pairs and over different time frames.

We’ll get onto that in the next and final blog post regarding this particular mean reversion strategy.

One final caveat to all this, is that I have tried my best to write this Python backtest in an accurate and logical way – if anyone can spot any errors in the code, whether from a scripting perspective or indeed from a fundamental strategy error perspective please do let me know in the comments section. Remember I am still learning Python so my word is far, far from gospel by any means.

I’ll try not to leave it as long until the next post this time – I need a kick up the ass to be a little more active with my updates!!!

## 15 comments

it goes nice until the line:

meanSpread = df1.spread.rolling(window=halflife).mean()

here’s an error message screenshot http://prntscr.com/e1l1eb

Hi there…yeah this is an easy fix…the pandas “rolling” function only accepts integers as the window, whereas the halflife variable is currently a floating point number. Just cast the halflife as an integer and you’re good to go.

So just add the following line:

And you should be good to go!

wow….thank you.

Have you tried to calculate hedge ratios by TLS instead of OLS? recently i’ve seen a pdf by Paul Teetor where he described it and it sounds….logical imo.

Hey, thanks for the post. I have a question regarding the following line:

df1[‘spread pct ch’] = (df1[‘spread’] – df1[‘spread’].shift(1)) / ((df1[‘x’] * abs(df1[‘hr’])) + df1[‘y’]).

You call this ‘spread pct change’ but this calculation is different then normal pct change.

Usually in the denominator is the first value (df1[‘spread’].shift(1) – in our case), instead you put, if I understand correctly, the sum of money (or units) of both assets.

Could you please explain why?

I calculated return by finding: pct change of x + hedge ratio* pct change of y

Thanks!

Hi there…the way I viewed it was as follows…

Let’s say for argument you have calculated a hedge ratio of -1, so for every unit of x that you buy, you sell 1 unit of y. Let’s then say that the price of x = $10, and the price of y = $15 at some point in time, T, so that the spread is $5.

If when moving to time T+1, the price of x remains at 10, and the price of y falls to $14, so you have gained $1 on the sale of y, if the denominator of the “spread pct change” was the initial spread at time T, then it would represent a (5 – 4) / 5 = 20% change and a 20% “profit”. But you havn’t really made 20% just because the spread changed by $1.

I think a better denominator to use is the gross value of the portfolio; that is the $10 + $15 which would represent a (5 – 4) / (10 + 15) = 4% return.

It’s always difficult to conceptualise percentage returns for long/short portfolios as you actually receive funds from the short sale of assets, which can theoretically cause your initial “Investment” to be zero…which we all know results in a nonsensical return of infinity.

I chose to use the absolute value of the long/short portfolio as mentioned in Ernie Chan’s book “Algorithmic Trading Winning Strategies and Their Rationale”

Hope that helps!

Could you point which chapter of Ernie Chan’s book “Algorithmic Trading Winning Strategies and Their Rationale” mention the absolute value of the long/short portfolio for calculating return of spread? I also have doubt for this part:

df1[‘spread pct ch’] = (df1[‘spread’] – df1[‘spread’].shift(1)) / ((df1[‘x’] * abs(df1[‘hr’])) + df1[‘y’]).

Hi,

Great post, was very explanatory and very useful for me. Thank you.

Just one question: I see you are cumsum’ing returns to get cumulative portfolio returns, as below:

df1[‘cum rets’] = df1[‘port rets’].cumsum()

I think you should be adding 1 to returns and cumprod’ing them. As below:

df[‘port rets’] = df[‘rets’] * df[‘numUnits’].shift(1)

df[‘port rets + 1’] = df[‘port rets’] + 1

df[‘cum rets’] = df[‘port rets + 1’].cumprod()

Would appreciate it if you let me know what you think.

Thanks again

Hi Kerem, yes you are indeed correct… I was unfortunately a bit too lazy in my treatment of returns vs log returns. If using simple arithmetic returns then yes, definitely the plus 1 and then cumprod approach should be used. If log returns then they can just be cumsummed.

Lazy? I don’t think that’s how I would describe someone who wrote a tome of really useful python trading articles 🙂 Many thanks again for the post, was extremely helpful

Hello,

Thank you for your post!

I’d like to know how to trade like this in real-life, how do we interpret spread to long one asset and short another asset?

Thanks

I suppose you have partially taken the code from QI though they have a huge mistake in calculation of the returns. df1[‘spread pct ch’] = (df1[‘spread’] – df1[‘spread’].shift(1)) / ((df1[‘x’] * abs(df1[‘hr’])) + df1[‘y’]) uses ‘hr’ that is calculated dynamically. Though when you enter the position you ‘lock’ your hr in terms of opened positions on both instruments that is why the spread pct ch can not be calculated that way and the ‘hr’ value should be kept until the position is closed. If you open a live trade based on the used method you will get the idea and the mistake.

What is QI? I wrote this code myself, I didn’t take it from anywhere although use Ernest Chan’s book as the basis of the logic.

You are absolutely correct about the flaw in logic regarding the hedge ratio being assumed dynamic whereas it is locked for the duration of the trade…I have been aware for a while and should probably put a warning at the top of the article…I shall do that when I get a moment.

I’m intrigued as to who or what QI is…

QI is QuantInsti. Ernie Chan is the founder of this project so the code is same independent to where you took it from. People are blindly copying it and might fall into that trap that is not easy to spot unless you either implement it in real trading or go string by string in code. I just wanted to warn users that might lose money because of it.

The fact that they’re not taking commisions into account is another reason for a loss (borrowing stocks really costs a lot on long distance).

Not to get into semantics, but when you say “Ernie Chan is the founder of this project…” – you mean the mean reversion strategy itself rather than QuantInsti.?

I have found numerous times that QI have lifted code from my posts verbatim, still with my comments in there, so forgive me if I don’t show much love for them as a whole.

In terms of E Chan and being the founder of a “project” – yes I chose to use his book as the foundation and yes I pretty much attempted to transcribe his Matlab code into Python code – but he is far from being the founder or creator of mean-reversion/pairs trading strategies based on statistical arbitrage as a whole.

But yes, I still agree with your fundamental point that this implementation is flawed for the very reason you describe, and that one should indeed be vigilant to such things.

I’ve re-read my above comment and it comes across as somewhat “combative” – which is not my intention at all, FYI.