Deadline June

Lessons from the data

Just ran a comparison of a simple pivot-point system trading 6 months with different historical data.

Just to make it not so heavily dependent on precise tick-by-tick data, I ran the test on hourly bars.

First I used IB historical data, freshly downloaded from their server.

Total number of trades: 127.
Profit per trade: -$147

Then I deleted the IB data and reloaded it from the IQFeed server:

Total number of trades: 123
Profit per trade: -$123

And last I deleted the IQFeed data and loaded up the tickdata from Disktrading:

Total number of trades: 111
Profit per trade: -$58

I've included charts of the different data, for 6 days with the pivot point indicator. The first is the IB data, the second is the same window, but with IQFeed data. The IQFeed data is missing two of the support pivot points - just in case anyone out there is bored and wants to play "spot the difference".
 

Attachments

  • IBData$GBPUSD (60 Min)  18_03_2010.jpg
    IBData$GBPUSD (60 Min) 18_03_2010.jpg
    73.7 KB · Views: 152
  • IQFeed$GBPUSD (60 Min)  18_03_2010.jpg
    IQFeed$GBPUSD (60 Min) 18_03_2010.jpg
    74 KB · Views: 156
  • Disktrading$GBPUSD (60 Min)  18_03_2010.jpg
    Disktrading$GBPUSD (60 Min) 18_03_2010.jpg
    74.5 KB · Views: 146
Last edited:
Lessons from the data

Part of my mind says, so what, live with it, write a system that handles it, because the future is not going to be like either of those charts.

Another part of my mind says "No". What if the 2 data histories are different types? It looks to me like the IQFeed data isn't built from the same number of ticks as the IB data, i.e. the bars in IB are bigger and have higher highs and lower lows.

I'll run the system in the simulated trading account for next week, and then delete the data NinjaTrader collects, download the IB data and run the system against that and I bet I get different results there too. That'll be the acid test.
 
Last edited:
Don't know

I think there are two issues involved here.

First, missed ticks. If one of the feeds that these providers are collecting to sell again misses a tick at the high or low of the bar - that's a bad bar.

Second, timing of the bars, starting time, ending time and the timestamp that the provider decides to label the bar with. So the open and the close are going to vary.

I think I'm seeing both problems here.

I'll have to think a bit more about what the implications are for building trading systems against their data.

Most of my systems so far use the Close of the bar as the exit price, or the basis of it, i.e. Close + x ticks / Close - x ticks.

Most of my systems also use ATR which is dependent on the highs and lows.

Originally when I bought the disktrading data I figured any problems like this would average out and not be an issue. Even with 60 min bars though I'm seeing massive differences in results, just look at those stats 2 posts back.

It doesn't figure. It seems to me that there's some sort of bias built in to the disktrading data, I gives consistently better results on all three of the systems I've tested so far.

Neither of the two types of differences in the data I just described can explain what makes the systems perform better on disktrading data.
 
In my own personal experience I use only one source of data, but if I were to compare two sources (which I will soon, because someone asked me), a "massive" difference telling me that one of the two sources is not reliable would have to be at least more than 20%. Something like with one system you make 100k and with the other you make 80k, which would still be good enough for me. If it were instead 50k and 100k, I'd admit I suck. But it seems to me that your differences are less than 20%. Forgive me if I am not very scientific and precise in my language.
 
Re: Lessons from the data

Your language is precise enough, no worries. But your maths totally out of whack.

First I used IB historical data, freshly downloaded from their server.

Total number of trades: 127.
Profit per trade: -$147

Then I deleted the IB data and reloaded it from the IQFeed server:

Total number of trades: 123
Profit per trade: -$123

And last I deleted the IQFeed data and loaded up the tickdata from Disktrading:

Total number of trades: 111
Profit per trade: -$58

-58 dollars is less than 50% of the -123 with IQFeed, and almost 30% of the IB data results. For me, it's too extreme. How can I use the disktrading data if the results are that different?
 
I agree with you that you should just trust one source. Yet I do trust disktrading. I have compared their data to other sources and didn't see much difference. Maybe because my systems trade differently.
 
That's for sure. I definitely get the impression you prefer lower frequency, higher profit trades. I prefer lots of small trades.
 
Yeah, because they outweigh the commissions and spread costs. I couldn't build anything profitable on lower timeframes (less than 6 hours long trades and happening more often than twice a week). Even my pivot system has been failing in back-testing, despite all the back-testing knowledge cumulated in these years.
 
My experience of the transaction costs with IB over the past 3 weeks has been good so I started looking again at systems which don't produce so much profit, but just a lot more frequently. Then I discovered this problem with the data and I'm trying to work out still whether it means I have to aim for systems that produce massive profit by comparison, like $50 or $100 per trade. So far I don't know.

But I decided I have to earn some money somehow so I'm starting to day trade. I had a trial run on Friday on the Euro and my guess is that a lot of the knowledge I picked up from writing automated systems is useful for day trading too.
 
Why didn't I start this earlier?

This is hilarious. I've made 3 mistakes in a row, all of which I should have avoided - hopefully they're newbie mistakes:

(1) was trying to write some code for an automated system but I saw the Aussie $ moving and thought I could trade that and just jumped right in. I hadn't even put a position on and I paniced slightly thinking the mkt was going to shoot off upwards. Piled in at the top. Then when it stopped rising almost immediately, I looked back at the chart and saw it was in a fairly obvious downtrend. Whoops. Bailed out for a 6 tick loss, 3 of which was on account of the 3 tick spread.

(2) so I'd identified the trend and waited until the market was dropping on the 1 min timeframe - good man - but fatally I accidentally bought instead of selling and it took me a minute to cotton on. During that minute Sod's Law was applied and I was out another 7 ticks.

(3) Then I managed to get in on the position I wanted. I even managed to get NinjaTrader to place the stop loss automatically. Woohoo. The market tanked a bit more and I duly pulled my stop loss down a bit to break-even. Too soon! I had ignored an earlier spike upwards (I thought "was that spike real?"), and now it did it again and stopped me out for a 1 tick loss.

(4) valiantly stopped myself from sitting here all night, decided to share my idiotic first steps here, and went to bed

:sleep:
 

Attachments

  • $AUDUSD (5 Min)  16_08_2010.jpg
    $AUDUSD (5 Min) 16_08_2010.jpg
    60.3 KB · Views: 135
IB TWS froze on me! I'm in shock.

I used to run it on linux and never had a crash, but just this week, I decided to run it on my Windows 7 machine right next to the NinjaTrader app, just to shave off that microsecond that it takes to communicate over the network - I'm such a hot trader I feel these things subconciously ;)

TWS freezing up was obviously not so funny. If it happens again, it's getting relegated back to the linux box.
 
Here's one for all you trendfollowers

Logically, how can I know when I have enough data in my backtests for it to be representative.

When I do a short-term daytrading style system, I sometime have thousands of trades in my resultset, so I'm not worried too much that the result won't be reproducible.

When I do a long term trendfollowing system, I make sure it's simple, that the system hasn't eaten too many degrees of freedom, and I have an out-of-sample test to prove it. But the out-of-sample test is pretty small and there aren't often that many trades. It's small because I don't have that much data to test it on.

Is that just tough? Do I have to get more?

Can I rely more on long term trendfollowing systems to work in the future when the long test periods that I do have access are slightly dubious quality?

Lastly, if there's a very low winners:losers ratio, if I only use a small out-of-sample history and the results are rubbish, it might just mean I didn't hit one of the mega-profitable trades during that period, right? Or do you throw it out anyway?
 
I just threw out the system anyway. If I can't get a decent sample size then I can't make any decisions about it.
 
I am dealing with sample size as well for my pivot system. My modest infallible opinion is as follows. If I get within the sample prices going up and down, in the same amount (2 months going up and 2 months going down), and I get about 50 to 100 trades, I should be able to produce a profitable system that is also profitable in the other 4 months of out-sample. Each time I fail to do that, I just go back to the system and try a different one. I don't blame it on the sample size being too small. I am not suggesting you did that. I am just saying: if my data is pretty diverse, and I have 100 trades that are profitable, the other 100 trades in the out-sample should be profitable as well. If they are not, then that's my problem. Not the sample's problem. Precisely because I am looking for a system that each time, out of 100 trades, will be profitable. Basically give me the number of trades out of which you want your system to be profitable, build a profitable system on a sample that sees that number of trades, and check how it does on an out-sample which sees another equal amount of trades.

It might have something to do with the concept of drawdown, too. And it might have nothing to do with what you were talking in your posts, but since you said the word "sample size", I gave my little modest and infallible opinion.
 
Need to get my act together

My trading, my computer, my workspace, in fact my life is a mess.

I've got half-finished systems with dozens of different names and directories full of excel spreadsheets with the results and a database rammed full of trade histories.

I've got gigabytes of tick data, one-minute data, daily data stashed all over 3 different computers.

The performance of the systems that I developed on disktrading.com tickdata all now perform with 75% at best of the results I was expecting, when I test it on the premium Tenfore 1 min bar data. The difference is so large I thought it must be a mistake, but I can't find any mistakes. It's just that simple.

I'm looking at my options and I see that I still don't really know what I can do with the automated trading systems - I guess another month or so of tweaking and inventing new ones, and then forward testing for another month.

Day trading on the EURUSD is a big learning curve. I think I'm working my way through all of the things you shouldn't do when you trade - trading too much, not being fast enough, trading on a low time frame, being indecisive and changing my mind mid-trade, cutting my winners short, etc etc.

I guess one positive aspect is that I haven't lost enormous amounts - only $400 this week - and I can cut my losses quickly, but that's about it.

My basic trading plan is really simple, I jump in when the market starts moving fast on the 5 mins time frame, and if it doesn't create a big candle in the direction of my position, I get out. Sounds simple but so far I hesitate getting in and miss several ticks, and hesitate getting out on losers, losing several more ticks there.

I figure a fast moving market is one where the market has moved 1 ATR before it's halfway through the bar. Then if it carries on going, it creates a big candle and hopefully settles around a new level on the profitable side of my entry, at which point I exit.

I keep thinking, if I reckon I can program what I'm doing, I shouldn't be doing it. I'm not sure if that's logical. Got to go, waffling too much.
 
75% is pretty good. If I have a correspondence of 75% between my back-testing mode on tradestation and my forward-testing mode on TWS, I will be satisfied. And I am not talking about execution prices (forget about that - that's impossible). I am just talking about 75% of the same trades (e.g.: two long trades on the same day in both data sets) having been made in the same period on the two different sets of data and by the two different platforms:
1) excel + tws: live trading
2) tradestation on disktrading.com data: back-testing mode (yet on the same period)

Imagine that i am running the same exact systems on 2 servers, with the same exact data, broker, platform (tws and excel), settings, with an atomic timesync program for the computer time. One server in the US and one in Italy.

And every week I get 20% (20 transactions out of 100) of execution price differences (usually 1 tick). I get every month a couple of trades that went in the opposite direction. And I get every week a couple of trades that were missed on one of the servers. And my systems are extremely simple and were tested on a 15-minute timeframe. Imagine if you do the same on a 1-minute timeframe. You will have totally different trades every day.
 
In case disktrading doesn't respond to my request for data (I sent them an email yesterday and still didn't get an answer), how would you like to give me your data in return for some other data? Or maybe even for free? I need ZN and EUR future contracts on a 15-minute timeframe. I suppose you bought it recently so it should be updated well into 2010 (I don't trust the updates, but you could give me that as well). If instead it is not recent data, then I still need to buy it, the deal is off, and you will have to find other people to donate your data to.
 
Sorry, I can't help you because I only bought the forex data. I don't have any tick data for futures. I subscribe to IQ-Feed for forex, and they offer their stuff on a 1 week free trial basis, and I asked them if I could have a free trial of their futures package just this week, but they ignored me. I guess they thought I was taking the mick. You could try it. Find out how much backdata they supply - probably 4 or 5 years of minute data, but only 30 days of tick data.

Thanks for the info about the disparity between your trading systems running on different machines. That's quite a big disparity. In fact, it's huge. I never thought it could be so huge. Is there a significant and consistent difference in performance?

I'm seeing a difference in performance. The disktrading data always gives me the best performance, followed by the IQ-Feed data, followed by the IQ-Feed Premium data and the IB data. I wish I had some IB live-collected data, because I believe it's different again. Just as long as I know what to expect.

I'm collecting IB live data now. So maybe in six months I can start doing some serious testing with it. Haha. If I have any capital left.
 
Yeah, thanks for replying about the data, unlike disktrading, who, after almost 2 days, still has not replied anything. Maybe they're all on vacation.

Regarding the discrepancy, I don't think it's big. But there is a discrepancy. If an order is triggered even just 2 seconds later, which is the case with my two servers probably, then (on my simple systems) you're going to have that kind of discrepancies: 20% of 1-tick discrepancies and 1% of orders not triggered or going in the opposite direction (long instead of short). So if that's a big discrepancy, which it isn't in my opinion, then you're never going to be happy with any comparison between 2 different sets of data, because when back-testing on 2 different sets of data you're definitely going to have more discrepancies than what you have between two identical servers/platoforms/systems 8000 kilometers away whose orders are triggered or received by IB at a distance of 2 seconds, due to either a different computer clock (which in turn causes different moving averages and prices), or different distance from IB.

If you have a 75% correspondence and if that is the type of matching I said (75% correspondence in profit, or 75% of "same trades in same direction same day...": forget having identical execution prices), then I say be happy with what you got.

 
Last edited:
the low-down on IB past data

Finally got a response out of IB regarding their past data which is provided in 1 min bars.

I noticed the difference between the past data and the live data which is collected by NinjaTrader in terms of the performance of my systems in backtesting on the different datasets:

Unfortunately IB does not provide tick by tick data. We provide historical data as snapshot requests taken at 300 millisecond intervals which is stored locally by IB and used to calculate the time bars generated by IB.

So it looks like NinjaTrader has a more logical and efficient algorithm for collecting the data from the live stream.
 
Top