Some reasoning about in-sample and out-of-sample
I was talking to another trader and he said that, as far as trading systems, the future is rarely as good as the past.
I don't know much about statistics but I'd like to reason about this out loud.
I just want to reason about it. Not read any papers. Not do any tests. Let's see if I can get to the conclusion by a logical deduction.
If I don't, it is just because I am in a distracted state of mind, due to concerns about personal issues in my life (work, vito, neighbours, etc.).
Yeah, because I think there's no need for empirical evidence to establish this.
So, let's say that on tradestation I test a bunch of hypotheses for systems that I think should work, and I come up, out of 10 strategies tested, with 5 systems that work in the In-Sample.
By the way, we should clarify that we have three phases:
1) In-sample, where I look for hypotheses that work (with the help of optimization)
2) out-of-sample, where I verify that those hypotheses weren't just lucky combination of rules/parameters
3) real trading, where I trade those systems with real money
Now, what my trader friend said, and I think he might be right, is that the "future is rarely as good as the past". So this should also be true for the out-of-samples. If it is true, that is.
At this point maybe we should add what "good" means. It means profitable. And we are up against not just chance but also costs (commissions and spread). So no system, by randomness, will ever be "good", "profitable", because we are up against randomness PLUS costs.
So, as a rule, a random system will be unprofitable, for the simple fact that it will be a break-even system MINUS costs. As a rule, only a system with an edge will be profitable. With an edge large enough to cover costs and add some profit.
Now, what my friend is saying is that there will be a tendency, within those 5 profitable systems, to not perform as well in the out-sample: not necessarily to become unprofitable.
Let's say that for a second we go back to my ten strategies before any optimization.
The chance of them performing in the out-of-sample as well as in the in-sample, is identical. As long as I don't mess with the parameters and the rules based on what I learn, by optimizing (i.e. "data snooping"), my rules will have an equal chance of working in one sample (the in-sample) as well as in the other sample (the out-of-sample).
So what is critical here is the process of optimization: that is what makes my systems likely to not perform as well as they have.
One more remark: my friend may have been referring also to the markets changing in the future, but let's set that part aside. However that part alone means that even if you did everything correctly in terms of methodology used, your systems won't perform as well. They could even perform better, but they are likely to perform worse, because you always will have commissions against you, which will be overwhelming if your edge ceases to exist. And, according to my friend, that is usually the case. After a while, that edge disappears. And I trust him on this sad reality.
So, as a rule, the system will not perform as well in the tests, and we have established this, but this is not what I wanted. Because all this means is that the system will not work as well once it stops working.
What instead I wanted to reason about and find out is whether, due to our process of optimizing, we're causing the out-sample to be worse than the in-sample.
Here I can't forget about my experience. It is true that out of dozens of systems I created (about 100), only about half of them worked in the out-of-sample, which means "were profitable in the out-of-sample". It doesn't even mean that they were as profitable, but just that they were profitable (after fixed costs).
So this proves it is true.
And I suppose this is because not all things that happened in the past happened because they followed a pattern that they will follow again. Sometimes they happened randomly. Then I come, and, via optimization, I piece together those random acts of the market, and find a pattern to them. The problem is that I may find a pattern that was not the pattern the markets followed.
The more reasoning you do on your system BEFORE optimizing it and the less you optimize it, the more you're likely to have success in the out-of-sample. And viceversa. I have not tried them yet, but I think this is why maybe genetic algorithms may not be so useful, in that they find good brute-force combinations. They are nothing but smart brute-force optimizers. And what we need is not combinations but reasoning. The combinations are easy to find with the regular brute-force optimization.
One should not even use a genetic algorithm if he doesn't know the markets, because they will produce so many systems that of course some of them are also bound to work in the out-of-sample.
One thing is to produce 10 strategies and 5 of them work in the out-of-sample. Another thing is to produce 10 strategies and 1 of them works in the out-of-sample. In the latter case, you risk finding combinations that work in the out-of-sample by luck. Your ability as a system-creator should be measured by how many of your systems fail in the out-of-sample. In my case, 50% of systems fail, and I feel ok about this ratio.
So, recapitulating, we have established that systems we create are not likely to work as well as in the in-sample, because:
1) the markets change and have a tendency to reduce your edge (from a specific system)
2) the process of optimizing on a sample data set gets you acquainted with what happened, and the past is not likely to be repeated that way - even if it didn't have a tendency to reduce your edge.
But there's still something missing. Two things. What is the relationship between in-sample, out-of-sample and real trading, and how much are the systems going to be worse?
Let's start again.
1) I test my strategies on the in-sample
2) 50% of them are profitable in the out-of-sample. Profitable enough to be worth trading. But it is true: slightly less profitable. Let's say they are roughly two thirds as profitable, on average.
And so far my friend was totally right. Because he was proven right by the fact that half the strategies do not work, and the other half do not work as well, only 66% as well.
Now let's get to the last part: the part where I trade the strategies.
More time has gone by. We are now in the out-of-out-of-sample.
My edge is likely to have disappeared even more, because more time has gone by. However, in terms of sample, there should be no difference, or yes?
Let us completely set aside the time factor which increases the chance for the edge to disappear (due to other traders trading the same strategy - the usual "efficient markets" talk).
Yes, yes, yes! There is still another difference.
It is the same difference I mentioned for the genetic optimizers creating hundreds of systems.
If we create a large amount of systems, even using the out-of-sample methodology, we are bound to find a lucky brute-force random combination that not only works out of luck on the in-sample but also work out of luck on the out-of-sample. Such a combination could happen maybe... guesstimate: 5% of the time or less. But out of dozens of systems created, there will be some systems that were lucky in both in-sample and out-of-sample and those systems will fail in real trading.
That is why it is of major importance that we have a good ratio of systems successful on the out-of-sample vs the amount successful in the in-sample. The more systems we verify on the out-of-sample the more likely we are to find a lucky one that will be profitable by pure luck. Ideally, we should have a feeling for this, and sense when a system is not ready to be tested on the out-of-sample. In fact, 6 months ago, until the investors told me to use it, i totally disregarded this method, whereas today I consider the out-of-sample verification as something sacred, to not be abused, only to be done once, at the end of all testing.
The result of not using the out-of-sample in my previous tests is that I ended up automating a good 50% of unprofitable systems, and my forward-testing took the place of the out-of-sample, and I needed to wait a year to find out which systems really worked.
Recapitulating, I've established that:
1) i test my strategies on the in-sample
2) half of them work on the out-of-sample but only achieve 66% of the performance they had on the in-sample
3) due to time eroding the edge, and due to lucky out-of-sample combinations (one system every 20) the performance of my systems will further decrease and in some cases they will never be profitable.
All in all, given my skills and work, I can expect the performance of the systems I create and approve for real trading, to be on average 50% as what it was on the out-of-sample. Yes, because by the time the good systems reach the out-of-sample they decrease by 66%. By the time they reach real trading they decrease another 16%, due to edge being eroded and lucky systems that were let through.
That partly explains why today I am showing only a small part of the profit I was expecting. This is also due to a poor selection of systems to trade. I had picked some systems that were actually unprofitable in the out-of-sample, too, just because I "trusted" them.