Hi, I saw this thread referenced somewhere else and was intrigued because I'm interested in some of the mathematically interesting quantitative methods.
However, I'm quite skeptical of the utility (at least at first) or wisdom of the more sophisticated machine learning methods for trading before the fundamentals are adequately addressed. I'm not an arrogant person generally, but I'll have to assert some (unproven) credentials here: this is an area where I do have some experience; it is my "day job", and I do predictive analytics commercially (non-trading related) for a successful product, and have a significant academic background in nonlinear signal anslysis.
Before jumping into the deep (and fun) end of neural networks/SVN's and now the new hot thing of back-to-very-deep-neural-networks, we need to adequately define the mathematical problem.
Critically, we need to define---in an insightful, problem-specific way---(a)what we are measuring, (b) what we are optimizing, and (c) how do we know when it's working or not? And only AFTER those are adequately defined and tested, do we start choosing classifier/estimator architectures & algorithms.
This may sound facetious but it actually isn't: have you tried linear regression? (*see below)
It is really critical to define both the target function (what do we optimize, and why?) properly for the problem, and the most underappreciated issue: over what ensemble (probability distribution of examples) are we training---and---measuring performance?
Often people don't think about this, they use the "convenience ensemble" without knowing any better: they collect all examples which are possible to collect, and then weight them all uniformly, and implicitly assume the test condition will be the same.
In trading it might be like collecting examples at fixed time intervals where data exist and making an "up-or-down" hard binary or soft binary prediction. OK, that's a start, but as trading that may be assuming a system where you have to be either long or short at every possible open trading hour. I don't think that's appropriate or likely to be successful. What is something really good for this problem? (I don't know personally).
Then, if you have a good target criterion suitable for the problem, how are you checking your success? Critically important is doing proper cross-validation (permutations of train/test sets) and making sure you don't have time-based correlation or target leaks across train & test. To be serious you should be also creating time-series of "surrogate data" from stochastic processes with similar behavior (e.g. GARCH conditional heteroskedastic models for finance) but no predictability to make sure you aren't able to spuriously learn something on these.
It takes quite a bit of thought and effort to get there---and I think we ought to start with the simplest classifiers at the core (linear & logistic regression, which can be learned quickly & determinstically), because the problem set up is hard enough.
When I say linear regression, I mean linear regression with proper normalization of inputs and, crucially for this problem, good input selection and regularization (ridge and/or lasso regression), with regularization parameters selected say by cross-validation experiments.
Only after you get a positive result with such simple methods (and this may mean feature creation) might it be worth going to the next level of nonlinear predictors. A former colleague who previously worked in a hedge fund said that any profitable strategy had to be reasonably simple and first work first with linear regression (or be visible with simple correlation in the right space). Sophisticated methods may squeeze a bit more performance out of the phenomenon but you had to get somewhere with simplicity.
No, I unfortunately don't have any answers or practical insight and probably sound like a cranky old fart. I guess my goal is to redirect the locus of thought and attention to thinking through some foundational issues which I feel are very important.
On the sophisticated "deep belief networks", etc: the researchers there are attacking core artificial intelligence problems where there exist numerous examples of skilled and effortless natural (biological) classification but computers find hard. (a breakthrough in A.I. is cognition of a two-year old chimpanzee). Trading isn't like this---signals are mostly noise and even smart humans don't regularly succeed. There's a general rule of thumb: the higher dimensionality of the problem and the higher the noise level, the more like a linear system it is.