3rd generation NN, deep learning, deep belief nets and Restricted Boltzmann Machines

Well, personally I have complete opposite opinion about time frames. High TF is just subsample of lower TF. In such case it is basics of DSP that when you down sample you lose information so comparing daily to 1min you loose with factor of 1440. Beside this from my research of market micro structure prediction horizon on finance series is very short(like 30 min). Its like weather forecast short term works, long no, this is a reason that i prefer lower TF to play.

And finally the evaluation. I simple don't believe in predictive power of any system if I dont see several hundreds of trades. I don't think trading on high TF can offer this so you have little number of training bars and little number of trades. In such case evaluation is not so much reliable.

Krzysztof
 
Last edited:
Well I speak for my personal experience regarding modeling difficulty, it's simply much easier to get profitable results on the higher time frames. The simulation I showed you also has a high statistical significance +1600 trades from 1986-2014 🙂 This system makes one prediction every day and enters the market based on whether the 3 NN systems agree or disagree. Positions are closed when they reach an SL or when a signal is reversed. In my experience if you want to achieve significant profits and good success rates from your training you should definitely tackle higher time frames.

I attach a back-test (non-compounding) so that you can look into it with any level of detail you wish :smart: As you can see, results are statistically significant and the NN ensemble does achieve long term profitable results on the daily time frame with constant retraining. The system retrains before every trading decision using information from the last N bars.
 

Attachments

Last edited:
Well I speak for my personal experience regarding modeling difficulty, it's simply much easier to get profitable results on the higher time frames. The simulation I showed you also has a high statistical significance +1600 trades from 1986-2014 🙂 This system makes one prediction every day and enters the market based on whether the 3 NN systems agree or disagree. Positions are closed when they reach an SL or when a signal is reversed. In my experience if you want to achieve significant profits and good success rates from your training you should definitely tackle higher time frames.

I attach a back-test (non-compounding) so that you can look into it with any level of detail you wish :smart: As you can see, results are statistically significant and the NN ensemble does achieve long term profitable results on the daily time frame with constant retraining. The system retrains before every trading decision using information from the last N bars.

A few questions.

1)what the 'volume' means on this chart ??
2) unit of drawdown is day i guess. So 880 means what ??
3) I see lot size is changing ?? What is a method of adjustment ??? Why you are using it ??

Krzysztof
 
1) volume is the volume traded for each position in lots (values on the right axis).

2) yes, 880 would mean 880 days.

3) we normalize the lot size to risk a fixed dollar amount (since the simulation is non-compounding) on the SL, when using compounding we adjust it to risk a fixed percentage of the account. The SL is adjusted against market volatility so this is why the lot size changes.
 
Last edited:
1) volume is the volume traded for each position in lots (values on the right axis).

2) yes, 880 would mean 880 days.

3) we normalize the lot size to risk a fixed dollar amount (since the simulation is non-compounding) on the SL, when using compounding we adjust it to risk a fixed percentage of the account. The SL is adjusted against market volatility so this is why the lot size changes.

and days are trading days or calendar ?? So this 880 days how is calculated ??
because this drawdown is not so much visible on this chart and it is more than 2 years..

Anyway as it is quite many trades so if no some future leak or something from this chart looks that it works but the best to be lucky and not start to trade at the beginning of drawdown....a lot of waiting for rebound 😀

Krzysztof
 
Yes, those are calendar days. The drawdown is actually very shallow and long (so you cannot see it well) but some further analysis shows it (see attached for a drawdown decomposition of the strategy).

Regarding future leaks, there is actually no possibility for data snooping within our implementation because the trading library is only given access to past candles relative to its current candle from the front-ends and therefore - even if the system wanted - it wouldn't have access to future candle data. You obtain the exact same results if you back-tests the strategy on MT4/MT5 and there is no possibility to do data snooping there (the tester only gives past candle information to the experts). The F4 framework allows us to execute our system code in MT4/MT5/JForex/Oanda without the need for recoding (all from the same library using different front-ends).

Also since we use the exact same code for back/live testing we have confirmed this through back/live testing consistency tests as well. The implementation is completely air-tight in this regard :smart:

You're also right about the drawdown 😎 Presently we have made some modifications to the system I posted here to achieve lower maximum drawdown length values (the minimum we've got up to now is about <400) I am also testing further NN modifications to see if I can make it even better than that :cheesy:

However once you put this strategy with other highly linear strategies within a portfolio the drawdown length problem is largely minimized.
 

Attachments

  • 2014-02-06_18-40-35.png
    2014-02-06_18-40-35.png
    60.2 KB · Views: 273
Last edited:
Yes, those are calendar days. The drawdown is actually very shallow and long (so you cannot see it well) but some further analysis shows it (see attached for a drawdown decomposition of the strategy).

Regarding future leaks, there is actually no possibility for data snooping within our implementation because the trading library is only given access to past candles relative to its current candle from the front-ends and therefore - even if the system wanted - it wouldn't have access to future candle data. You obtain the exact same results if you back-tests the strategy on MT4/MT5 and there is no possibility to do data snooping there (the tester only gives past candle information to the experts). The F4 framework allows us to execute our system code in MT4/MT5/JForex/Oanda without the need for recoding (all from the same library using different front-ends).

Also since we use the exact same code for back/live testing we have confirmed this through back/live testing consistency tests as well. The implementation is completely air-tight in this regard :smart:

You're also right about the drawdown 😎 Presently we have made some modifications to the system I posted here to achieve lower maximum drawdown length values (the minimum we've got up to now is about <400) I am also testing further NN modifications to see if I can make it even better than that :cheesy:

However once you put this strategy with other highly linear strategies within a portfolio the drawdown length problem is largely minimized.

One remark here,
i dont know about your exact approach but i also see possible bias here:

How did you optimize the external model parameters like input space, network structure, voting scheme? If you just picked these according to your results above, its biased. I agree this is a general issue in machine learning but there are approaches to minimize that bias (e.g. nested cross validation) What do you think?

greetings
 
I believe (and hope !!!) he did it 'Walk Forward' way so retrain every day, collect the results from testing day than move the window 1 bar and retrain again. If it is done like this than is OK.
Results should be a collection of this 1day tests.

Krzysztof
 
The NN does retrain every day in the way that Krzysztof mentions, so the whole test can be considered an out-of-sample regarding machine learning predictive capacity (it's a collection of all the 1D predictions based on daily retrained networks).

However you clearly need to define a network topology, input number, example number, training epochs, etc. The parameters for the NN were decided by using only the initial 20% of the testing period and then they were applied to the whole period. Note that NN parameters do not have a very large influence in results, the biggest influence actually comes from the input/output structure you select to make predictions (the variables you use as inputs and outputs). I hope this answers your questions 🙂
 
Last edited:
2 metodologies ??

The NN does retrain every day in the way that Krzysztof mentions, so the whole test can be considered an out-of-sample regarding machine learning predictive capacity (it's a collection of all the 1D predictions based on daily retrained networks).

However you clearly need to define a network topology, input number, example number, training epochs, etc. The parameters for the NN were decided by using only the initial 20% of the testing period and then they were applied to the whole period. Note that NN parameters do not have a very large influence in results, the biggest influence actually comes from the input/output structure you select to make predictions (the variables you use as inputs and outputs). I hope this answers your questions 🙂

Hi Daniel,

Yesterday I watched video from you site about ATINALLA FE and it was clear for me that you conclusions about 'long term statistical characteristics' (your words) for this EA were based on MT4 backtester results which don't confirm any predictive power. But for NN you used Walk Forward evaluation. So are you using
2 methodologies for evaluation ??

Krzysztof
 
Last edited:
Hi Daniel,

Yesterday I watched video from you site about ATINALLA FE and it was clear for me that you conclusions about 'long term statistical characteristics' (your words) for this EA were based on MT4 backtester results which don't confirm any predictive power. But for NN you used Walk Forward evaluation. So are you using
2 methodologies for evaluation ??

Krzysztof

This was a simple system developed many years ago (in 2010), things have evolved since then.😉

This NN system is evaluated using a normal back-test but within that back-test the retraining is done on every bar, training/prediction therefore follows a WFO pattern. Please note that all the execution is done within our powerful F4 C/C++ framework so we can do this (which would otherwise be very hard in MQL4). In our code the system takes the bar history information obtained from MT4 (or whichever other front-end is used) and performs the training/prediction process within the F4 code on every bar of the back-test. There is no need for any complex testing procedures, by running a simple back-testing we obtain machine learning WFO results :smart:

Also note that within Asirikuy we test many different types of optimization, development methodologies and system types. So there is definitely a little bit of everything. For example this is just one of our machine learning examples (using NN) but we also have historically profitable systems using constantly retraining SVMs, linear classifiers, etc.

I hope this helps 🙂
 
Last edited:
Just sharing a quick analysis:

using a range of features with random forests applied on EU market open (5min bar) over the last 3 years the out of bag sample error of the training data scores considerably higher than randomness.

samples where constructed on TP hit within 1 hour (in both directions for TP / TN)

Williams R% had the highest feature importance for the results you see

red: error class 1
green: error class 2
x: number of trees grown

c6814e18c0.png


fabwa
 
Last edited:
Just sharing a quick analysis:

using a range of features with random forests applied on EU market open (5min bar) over the last 3 years the out of bag sample error of the training data scores considerably higher than randomness.

samples where constructed on TP hit within 1 hour (in both directions for TP / TN)

Williams R% had the highest feature importance for the results you see

red: error class 1
green: error class 2
x: number of trees grown

c6814e18c0.png


fabwa

So how often it was retraining ??? Every bar ?? ROC curve and confusion matrix ??

Also consider that in case of signal change all trades from one side must be closed
and not kept till 1 hour will expire. This and commision can change the picture dramatically ?? So I think those 2 signals should be combine into one.
 
Last edited:
Hi Fabwa,

Be careful with that "out-of-bag" error in random forests, they will certainly contain data-snooping bias because random forests are not aware of the time dependent component in your time series analysis. The random forest will keep some items out-of-bag for prediction but these sampling is meaningless because a model is built using future and past conditions.

This means that you can meaningfully predict some samples of the past based on a combination of past/future data (a model built from samples all over the place), which is not useful for live trading. If you want to properly test your accuracy on a random forest model using time series you need to split the set manually so that your test set will always be after (future) to your training data.

Good job however 🙂 I have also explored RF with some good results,

Best Regards,

Daniel
 
Last edited:
Hi Fabwa,

Be careful with that "out-of-bag" error in random forests, they will certainly contain data-snooping bias because random forests are not aware of the time dependent component in your time series analysis. The random forest will keep some items out-of-bag for prediction but these sampling is meaningless because a model is built using future and past conditions.

This means that you can meaningfully predict some samples of the past based on a combination of past/future data (a model built from samples all over the place), which is not useful for live trading. If you want to properly test your accuracy on a random forest model using time series you need to split the set manually so that your test set will always be after (future) to your training data.

Good job however 🙂 I have also explored RF with some good results,

Best Regards,

Daniel

Yes, there is only one way of evaluation for time series; Walk Forward way.
Rob Hyndman describes it well at the bottom of this page

http://robjhyndman.com/hyndsight/crossvalidation/
 
Hi Fabwa,

Be careful with that "out-of-bag" error in random forests, they will certainly contain data-snooping bias because random forests are not aware of the time dependent component in your time series analysis. The random forest will keep some items out-of-bag for prediction but these sampling is meaningless because a model is built using future and past conditions.

This means that you can meaningfully predict some samples of the past based on a combination of past/future data (a model built from samples all over the place), which is not useful for live trading. If you want to properly test your accuracy on a random forest model using time series you need to split the set manually so that your test set will always be after (future) to your training data.

Good job however 🙂 I have also explored RF with some good results,

Best Regards,

Daniel

Thx for your comment, though i think you lack some of the information of my approach to proof your points:

i am only looking at x bars on a low TF before market opening to compute the features (where x is small). Then all my samples are randomly shuffled. I do know that this approach implies the assumption that the samples are independent over time, yet performance should "suffer" rather then benefit from this "snooping".

greetings

fabwa

@ Krzysiaczek99 i am aware of cross validation and its problem to represent actual performance.. out of bag samples in RF are different.. and to your first post: this obviously is not a performance test for live trading.. i was just trying to give some prelim results to show the potential, i still suggest you to look into the details of algorithms instead of sticking to your scheme of evaluation..:clover:
 
i am only looking at x bars on a low TF before market opening to compute the features (where x is small). Then all my samples are randomly shuffled. I do know that this approach implies the assumption that the samples are independent over time, yet performance should "suffer" rather then benefit from this "snooping".

The assumption of time independence is wrong, which is my point. I encourage you to test and verify this and also to quantify its influence on your results (I bet it's very big and much more positive than what you think!). I hope this helps :smart:

PS: You cannot ignore the time character of financial time series, it leads to huge problems due to snooping. When tempted to make assumptions such as "time independence" always, always, always make sure you test them to be true before building models based on them.
 
Last edited:
The assumption of time independence is wrong, which is my point. I encourage you to test and verify this and also to quantify its influence on your results (I bet it's very big and much more positive than what you think!). I hope this helps :smart:

PS: You cannot ignore the time character of financial time series, it leads to huge problems due to snooping. When tempted to make assumptions such as "time independence" always, always, always make sure you test them to be true before building models based on them.

OK - fair enough 🙂 let me do further analysis - tho if there is dependence involved my approach is likely to minimize its effect. For every bar of interest i am just looking at its 20 ancestors for feature computation (to the previous bar of interest its 288 bars) Therefore i imagine it like a distinct "activity" happening independent of the former activities. e.g. a "jump" in 3D acceleration data over time looks "similar" no matter the jump was in the morning at day x or in the evening of day x +- 1000. So i do defend my point that it depends on how you look at the problem.

cheers

fabwa
 
Last edited:
a "jump" in 3D acceleration data over time looks "similar" no matter the jump was in the morning at day x or in the evening of day x +- 1000. So i do defend my point that it depends on how you look at the problem.

I also made that mistake at some point. Features may look similar when you look at them in this manner but in essence some relationships develop through time -which are not easy to detect for us but easy for a random forest - that make prediction of past examples with data built from future examples easier. Do an analysis where your samples respect past/future ordering and you'll see what I mean (you'll probably see a significant drop in prediction accuracy). When live trading you cannot build a model to predict the next bar based on future data (what you're doing in your model).

I look forward to your results respecting the time component. I strongly encourage you to do this!! 🙂
 
Last edited:
shuffle or not

I discovered this like few years ago that random shuffling don't impact results
but i repeated this test to be sure so here they are. Trading time January 2014
21 trading days. As you can see no impact....so no real time series...
In 1st result line

groups = groups(randperm(N));

is commented out

Krzysztof

EURUSD SL15 TP30 RSI50 RBM20

ALGO RESULTS
Profit PP AC MC Kappa p PF WL
-500175.00 29.74 0.34 -0.32 -0.31 0.00 0.48 0.13

and after random shuffling

%Create batches
numbatches= ceil(N/batchsize);
groups= repmat(1:numbatches, 1, batchsize);
groups= groups(1:N);
groups = groups(randperm(N));
for i=1:numbatches
batchdata{i}= X(groups==i,🙂;
batchtargets{i}= targets(groups==i,🙂;
end

ALGO RESULTS
Profit PP AC MC Kappa p PF WL
-472372.00 29.40 0.34 -0.33 -0.31 0.00 0.48 0.12
 
Top