Build Neural Network Indicator in MT4 using Neuroshell

Kryzs,

As you see on attached Multimarket training short report.pdf, the software change automatically the training from test pattern 281 (20% of training range) to calibration intervals 1125 (equal to training patterns). It is happened when I am using Turboprop as weight updating method (see NN #1, 3 and 4). It was not happened while using other method (momentum or vanilla), see NN #2.

I am sure it solve the training problem.

Arryex
 
Hi,

I just looked to your report. Problem is that you can not use test period as an out of sample period.

Test period is used for calibration so data leak will occur if you will use it as an OOS. I think you should use production period for this if you used this option so perhaps this is a reason of good results.

In NSDT test period is called 'paper trading' so your results are equal to results obtained in
'paper trading' period but not out of sample period.

Krzysztof
 
Kryzs,

As you see on attached Multimarket training short report.pdf, the software change automatically the training from test pattern 281 (20% of training range) to calibration intervals 1125 (equal to training patterns). It is happened when I am using Turboprop as weight updating method (see NN #1, 3 and 4). It was not happened while using other method (momentum or vanilla), see NN #2.

I am sure it solve the training problem.

Arryex
I don't want to pile on here, but here's my 2 cents worth.

If your network has many weights and few training samples (your "standard" net has 2640 weights and 1125 training samples) then there will be many combinations of weights that will achieve a given mse on the training set. This is because the many weights allow you to curve fit the few samples in many different ways. Some will do better than others on the "test" set. Optimization (this is called Calibration by NS2) will select the set of weights that did best on the "test" set; hence this "test" set becomes part of the training data, and is not an independent sample. To avoid future leak you need a third set of data as an out of sample set. NS2 calls this the "production" set.

Following Krzysz lead, here's another image from the NS2 tutorial that explains the need for three data sets. Tut1.jpg

As explained in the tutorial, this applies no matter the backprop method or the network architecture used.

MadCow
 
Last edited:
Guys,

I renew the training and add the 20% production data. I attached the report in pdf, but sorry from now on the pdf is password protected and I will give only to the one support this thread (fair enough isn't it?), please PM me.

Arryex
 

Attachments

  • Multimarket Calibration.pdf
    536.1 KB · Views: 645
  • output file.csv
    50.8 KB · Views: 493
Here the other result using NS predictor, there are two methods available there (neural Net or Genetic Optimization). But within the software as well as NSDT, the network architecture, training weight updating methods can't be seen.

Using the same data seems genetic optimization methods gave more better performance result (0.9766% for NN and 0.9952% for GA).

Arryex
 

Attachments

  • Predictor Result.pdf
    533.6 KB · Views: 772
Guys,

I renew the training and add the 20% production data. I attached the report in pdf, but sorry from now on the pdf is password protected and I will give only to the one support this thread (fair enough isn't it?), please PM me.

Arryex
Thanks for the results. I have PM'd you.

I looked at your output csv file. This output seems a lot different from previous. std dev for first 1000 samples is 130 pips. for last 50 samples is 190 pips. I would expect first 1000 samples to have same std dev as your prior training, or close, since the same samples are used for training. I would expect the last samples to be worse since they are the production, or at least a part of it.

I thought your prior training error was ~ 1 pip?? Maybe the error reported by NS2 is scaled somehow, so it appears different than actual output ?
MadCow
 
Hi Arry,

I also PMed you to gmail.

Can you post .pat file which you are using for test so I can open it with NS2 datagrid. I tried to open file from 1st post with NS2 but it seems to be from different program.

I can be also csv file in format like in datagrid.

I will use this file to train the net using MBP which is capable of making more advanced net than NS2
so we can compare the prediction error.

Krzysztof
 
Hi Arry,
Thanks for the pdf's and all the work you did to get them.(y)
A couple of comments after reading the pdf's.

1. Since we are using lags to predict leads, it is probably best to choose the test and production set sequentially, instead of randomly. e.g. use the "All patterns after N through M..." button. A random selection suffers the possibility of future leak, due to the possible overlap of lag and lead.

2. Eventually we must face the problem of non-stationary data. This can be a serious problem when training a net. The problem arises because the training set is not representative of all the data. One way to see this is to notice that if you use the price of EUH1 say between 9/08 and 7/09 the price will never exceed 1.427. Yet from 7/09 to 1/10 the price hardly ever was below 1.427. This means that a net trained with the early data and (OOS) tested with later data has never seen data in the range of the test. This is because the price series is non-stationary. The mean (and standard deviation) can change. Non stationarity has many other ramifications, but this one is clear. The typical method used to combat this is to first transform the series by taking differences. These will usually be stationary, or at least have the same range and mean. You must also transform the target by taking differences.

You will notice that most of the inputs to the NeuroTrend indicator are differences of ema's or else they have been transformed to always lie within a specific range (RSI, etc.). These inputs may be non-stationary..I do not know, but at least the net will be trained with variables that cover the whole range of possibilities.

3. Another method to combat non-stationarity is to retrain the net occasionally. You can get an idea of how often by looking at the SMA of the mean square error. Of course this complicates the use of the net in real time if it requires frequent retraining, but it may be necessary. To explore this possibility you usually need a lot of data.

Keep up the good work.
MadCow
 
I have difficulties to collect the multimarket data in MT4 from a chart.

Example if I am in EURUSD daily chart, using script or indicator I use the following code:
input[0] = iHigh("EURUSD",PERIOD_D1,i);
input[1] = iLow("EURUSD",PERIOD_D1,i);
input[2] = iClose("EURUSD",PERIOD_D1,i);
input[3] = iClose("EURJPY",PERIOD_D1,i);
input[4] = iClose("USDJPY",PERIOD_D1,i);
input[5] = iClose("#COMP.XO",PERIOD_D1,i);
input[6] = iClose("#DJC.XDJ",PERIOD_D1,i);
input[7] = iClose("#DJT.XDJ",PERIOD_D1,i);
input[8] = iClose("#DJU.XDJ",PERIOD_D1,i);
input[9] = iClose("#SPX.X.XP",PERIOD_D1,i+10);
But still giving zero data. any body can advice?

Kryzs, Here the NS2 pattern data I used to train the network (previously using NSDT to collect them). Good if I can have the runtime libraries to call trained net in NS Predictor/Classifier...I need it

Fralo, I will have a look for your advice when I am in home..

Thank you
Arryex
 

Attachments

  • NeuraldataEURUSDMultimarket pattern.zip
    170.5 KB · Views: 557
sample size - data minig bias error

Hi Arry,

I had a look to your results. FIrst results from NS2 (error measured) are not against
production period but against test and train period. Only error against production period placed at then end of train and test period is OOS i think. Im not sure if NS2 can show this error.

Results from predictor are in sample i think.

But the main question is what it the real predictive power of such trained net. Because assumption than we will get the same results out of sample as during training is wrong.

Beyond the errors show in IS period there is a data mining bias error which must be considered here. See screens. This error largely depends from sample size and number of rules so for example for 1024 rule and sample size 1000 it is like 12%.

If number of ruled grows, DMB grows, if sample size decreases DMB grows. I think number of rules is number of epochs in our case.

So i think the best solution it not to make this test on daily charts but e.g. 1h charts
so it will increase sample size by 24 and decrease DMB error. Perhaps more advanced methods must be used to evaluate properly predictive power of this set up like those

http://www.evidencebasedta.com/MonteDoc12.15.06.pdf
http://www.patentstorm.us/patents/6088676/description.html

Krzysztof
 

Attachments

  • DMB3.JPG
    DMB3.JPG
    232.1 KB · Views: 427
  • DMB2.JPG
    DMB2.JPG
    160.2 KB · Views: 453
  • DMB1.JPG
    DMB1.JPG
    240.5 KB · Views: 530
Re: sample size - data minig bias error

Hi Guys,
Krysz.. I think number of rules is also proportional to the number of weights in the net, because more rules are possible with more weights. When number of weights > number of samples, lookout! The suggested limit on number of weights in NeuroSolutions Trader is 1/10 the number of samples in the training set. I have seen this elsewhere in the neural net literature, but cannot remember where (hence MadCow memory).

Arry.. MT4 does not accept an array element in the FileWrite statement, so you need to do something like

double input1 = iClose("EURUSD",PERIOD_D1,i);
double input2 = iClose("EURJPY",PERIOD_D1,i);

FileWrite(handle, MyDate,MyTime,input1,input2);

I tried it and it works.

Regards,

fralo
 
Last edited:
I don't know how number of rules is related to NN. My impression after reading EBTA book was
that every epoch we are getting new net with a new set of weights so its a new rule. But i can be mistaken with it.

Krzysztof
 
Using MBP to predict high

Hi guys,
I tried to replicate Arry's net to predict high (Post 4) using MBP, however I tried to predict only 1 bar ahead to simplify the net. Images are attached. Look at the density of the network...so many weights that I should be able to do very well on the training data, and I do very well looking at the RMS image. However, the comparison of desired to predicted shows that the trained net does not do well at the extremes, just as you would expect because there are fewer samples at the extremes for training. I trained MBP using training data only (I did not specify a test set during training). Then I specified a test set and generated the figure showing RMS and desired vs predicted for test data. Here the results show more clearly the problem at the extremes, where training data is lacking.

A problem with MBP is that the mse is measured by the program on the normalized output, so to get a true measure you must generate the net code and write a main() to get the output. I have not yet done this for the 15-38-38-38-1 net, but I am pretty sure that the prediction won't be useful due to the nonstationarity of the data. The data and resulting net are in the zip file.[/ATTACH][/ATTACH][/ATTACH][/ATTACH]

I think I will try to use MBP on fewer , differenced inputs with a smaller number of weights..maybe report the results later after playing chauffeur for my wife.

fralo
 

Attachments

  • Test.jpg
    Test.jpg
    643.1 KB · Views: 537
  • Train.jpg
    Train.jpg
    602.8 KB · Views: 604
  • RMS.jpg
    RMS.jpg
    502.7 KB · Views: 489
  • Arch.jpg
    Arch.jpg
    1 MB · Views: 613
I don't know how number of rules is related to NN. My impression after reading EBTA book was
that every epoch we are getting new net with a new set of weights so its a new rule. But i can be mistaken with it.

Krzysztof

I don't know either, but I do know that the more weights in the net, the more likely to curve-fit, and I think that curve-fitting causes more bias.
 
Re: Using MBP to predict high

Hi guys,
I tried to replicate Arry's net to predict high (Post 4) using MBP, however I tried to predict only 1 bar ahead to simplify the net. Images are attached. Look at the density of the network...so many weights that I should be able to do very well on the training data, and I do very well looking at the RMS image. However, the comparison of desired to predicted shows that the trained net does not do well at the extremes, just as you would expect because there are fewer samples at the extremes for training. I trained MBP using training data only (I did not specify a test set during training). Then I specified a test set and generated the figure showing RMS and desired vs predicted for test data. Here the results show more clearly the problem at the extremes, where training data is lacking.

A problem with MBP is that the mse is measured by the program on the normalized output, so to get a true measure you must generate the net code and write a main() to get the output. I have not yet done this for the 15-38-38-38-1 net, but I am pretty sure that the prediction won't be useful due to the nonstationarity of the data. The data and resulting net are in the zip file.[/ATTACH][/ATTACH][/ATTACH][/ATTACH]

I think I will try to use MBP on fewer , differenced inputs with a smaller number of weights..maybe report the results later after playing chauffeur for my wife.

fralo

Interesting. In 13 min you made 127 epochs. I made around 1000 in 2 minutes on the same data with NVIDIA 250 GT 128 cores than crash after exact 987 epochs. It happened three times. CUDA dont support connection between input and output layer so error was much bigger even after 987 epochs. I will report it to Noel.

When it works in normal non CUDA mode its single threaded. I think Noel didnt consider
non CUDA users :D

Regarding data. I think its not Arry's data. 20k of bars for training and 5k for test. What TF and what date range it is ??

Krzysztof
 
Last edited:
error 142 pips after 1382 epochs...hmmmmm.

Crash reason was most likely overheating of GPU.
 

Attachments

  • train.JPG
    train.JPG
    111.2 KB · Views: 472
Now training on the same data, the same number of epochs but results very different.
MBP seems to be not perfect...

Krzysztof
 

Attachments

  • train1.JPG
    train1.JPG
    116.3 KB · Views: 535
  • train3.JPG
    train3.JPG
    117.2 KB · Views: 493
Now training on the same data, the same number of epochs but results very different.
MBP seems to be not perfect...

Krzysztof
I'm not sure what's going on there Krzys, MBP will not give the same results every time you train a net, but in my experience they are not usually that different. Did you start with randomized weights? (See the FAQ at http://dit.ipg.pt/MBP/FAQ.aspx). Sometimes it gets stuck in a local minimum, but I have repeated the training several times and always get roughly the same result.RMS error ~ 0.005. Here is another image of another training run starting with randomized weights.
Another net.jpg

The bugs reported on the MBP site suggest that when you use CUDA you need to modify the registry because Windows does not like it when the GPU runs for more than 5 sec.. But maybe you already did this.(See Bug 2.1.1 beta - Blue Screen of Death with CUDA at http://dit.ipg.pt/MBP/bugs.aspx).

You are quite right, the data is not from Arry. I wanted more samples, so I collected data using the MT4 terminal on an MBT demo account. Here is the dated version of the data.
View attachment DatedData.zip

I made a mistake describing how I trained the net for the pictures. The net in the c file was trained with the test file active, because it must be to get the test RMS. But I tried several times without the test file and got roughly the same training RMS error. MBP does not optimize using the test data, so the only future leak is from the operator stopping training when the test rms error is minimized. I tried to ignore the test data for the images, so there should be small if any bias in the images. In the future, I will reserve a production set, and run the net on that set using the c code. That i s a bit of a pain, but I think as easy as using NS2 on a production set.

MBP is not perfect but I like it compared to alternatives.:D
fralo
 
Another MBP net

Here's another net trained with MBP to predict the next bar high. I wanted to illustrate that one can get as good or better results using data that has been processed to avoid the nonstationarity problem, and that the net need not have so many inputs or weights. This net uses 1 hidden slab of 15 nodes with 4 inputs (4-15-1), so 75 weights, or about .003 weight/training sample. Curve fitting is almost impossible, and this will help generalization a lot. The inputs are calculated in the mq4 script attached in the zip file. They are:

• e is the normalized average of (C-L)/(H-L)
• d is the normalized average of (C-O)/(H-L)
• r is the normalized average of range (H-L)
• v is the normalized average of volume

The normalization is adjusted so that each feature has variance ~ 1, and mean 0, and the normalization adapts slowly.

The net predicts the difference of the next high from the current high. To get the high prediction one must add the current high. When all is said and done, this very simple net results in an mse error on the test set of about 16 pips. The 15-38-38-38-1 had an error of 40 pips using the same period of test data. It has .17 weight/training sample, so it too should generalize well, but curve fitting is possible.

Neither of these nets is going to be particularly useful by itself because the ATR is about 15-20 pips, but this net does illustrate that bigger is not always better, and careful input processing will be required to squeeze the best out of any neural net.

BTW it trained in 6 sec... no CUDA. Love that MBP:) Thanks again Krzys for the link.

View attachment PredictHigh.NoHigh.zip
 
Thanks Fralo for the MT4 coding advice, it works for me now. I use it on MT4 script.

To day, I am going to catch my flight to Dubai...

See you..
Arryex
 
Top