We structure Solving Intelligence for Investment Management into three independent but complementary open finance-specific Machine Learning research problems. The solution to each problem constitutes an essential module in what we consider The Core Architecture: The Eyes, The Brain and The Risk Manager.
We have a working solution to each of the three aforementioned problems, which we are constantly iterating on, both to widen their scopes and to improve performance.
The Eyes: The first problem is concerned with learning features that are conducive to finding alphas from large amounts of noisy data, under dimensionality budget constraints.
Although images are made of pixels and light is made of photons, we (human beings) make sense of images, we never reason at the pixel or photon level, we always reason at the level of higher level abstractions such as shapes of curves, entities and sentiment from text, colors, motion of objects, background knowledge about objects, to name but a few, and associations thereof. In fact, not only are we not aware of individual pixels in an image or photons in light, but reasoning at such a microscopic level to get by would be computationally impractical, even for the human brain, and very inefficient. No single photon or pixel ever contains enough information in isolation, one needs a set of pixels ordered in a specific way to recognize a face, a pattern in a price curve, a word in a text, to detect a moving object etc. A red, green or blue pixel by itself is meaningless.
Similarly, The Eyes of our AI are concerned with reducing large amount of noisy data that are (plausibly) conducive to anticipating market moves down to lower-dimensional (i.e. compressed) representations that are equally conducive to anticipating market moves. From a Machine Learning perspective, this is similar to a compression problem, with the caveat that one is not interested in minimizing the error made by reconstructing the original input from its compressed representation, but rather in minimizing the loss of the 'useful' part of the raw data, the signal thanks to which the raw data is conducive to anticipating market moves. Considering the low signal-to-noise ratio in financial markets, attempting to treat this problem as a traditional compression or auto-encoding problem may result in an encoding scheme that does a pretty good job at reconstructing the original raw data (i.e. has a fairly low reconstruction error), but that wipes out all the 'useful' part of the raw data in the process, and preserves all the (dominating) noise. The very definition of the extent to which any data is conducive to anticipating market moves, and the need to quantify how much 'useful' information there is in raw data, require structuring, let alone solving, this research problem with a finance-first mindset.
Solved right, this Machine Learning problem reduces the amount of computational power required for alpha exploration, while mitigating the likelihood of finding statistical flukes. The ultimate long-term aim here is to bridge the information gap between the best Quants and our machines, so that our machines would not be at a data disadvantage in the alpha exploration process, but without requiring an impractical amount of compute power and without increasing the occurrence of statistical flukes.
The Brain: The second problem is concerned with finding alphas from the compressed representations learned by The Eyes, as well as background knowledge.
Not everyone who can read can understand a graduate-level Math or Quantum Physics textbook, and the smartest mathematician or theoretical physicist would have a hard time understanding a textbook in his/her field that is written in a foreign language. Similarly, the features learned by The Eyes can be thought of as defining an ontology of (non-tradable) risk factors, from which The Brain continuously finds all (tradable) risk factors or combinations thereof that are not currently priced-in, or that are mispriced by the market.
Sub-problems addressed by The Brain include defining an alpha algorithmically, choosing a flexible enough space of candidate alphas to explore, being able to test whether an alpha is viable, and being able to efficiently navigate through the space of candidate alphas, in a way that both mitigates the likelihood of finding statistical flukes and minimizes the marginal cost of finding one new viable alpha.
This is a non-traditional Machine Learning problem. It is neither a supervised learning problem (e.g. image classification and regression problems), nor an unsupervised learning problem (e.g. computer-generated art), nor a reinforcement learning problem (e.g. game playing), at least not in the traditional sense. The closest traditional Machine Learning problems to this one, namely supervised learning and reinforcement learning, intrinsically aim at finding a single solution, the best solution (e.g. the best image classifier in a family of candidate models, the best policy to adopt to play a game etc.). The existence of competing solutions (i.e. local extrema) to these problems is usually undesirable. However, when it comes to alphas, the number of viable alphas found matters at least as much as the performance of each alpha. Focusing one's effort on finding the best alpha would be ill-advised from a risk management perspective. Hence formulating this problem as an optimization problem would also be ill-advised. Once more, properly formulating this problem mathematically, let alone solving it, requires adopting a finance-first mindset.
The Risk Manager
The Risk Manager: The last module is concerned with scalable risk-based aggregation of a large number of viable alphas.
All alphas eventually stop performing. While The Brain continuously works to find new viable alphas early-on in their lifecycles, The Risk Manager keeps track of alpha performances on an ongoing basis, as well as any dependency between found alphas, so as to decommission fading alphas, and optimally aggregate remaining alphas from a risk perspective.
This problem is a unique blend between a traditional Machine Learning problem, namely Mixture of Experts Modeling, and a traditional finance problem, namely scalable Portfolio Optimization, but it is not quite one or the other.