The following is the second and concluding installment of a two-part article from Vinesh Jha, CEO of ExtractAlpha, a quantitative independent research firm, reviewing alternative data adoption, perceptions and best practices. The first article discussed the early stage nature of alternative data.
Alternative data is still in the early stages of its adoption cycle because of a variety of factors including a proliferation of alt data sets many of which may have marginal value, difficulties in evaluating the data, and the challenges of developing robust integration into a traditional quant investment process.
Misperceptions, herding, and not enough FOMO
Beyond the difficulty of tackling the problem, there could be misperceptions about crowdedness. Perhaps the many holdouts are simply hoping that value, momentum, and mean reversion aren’t really crowded, or that their take on these factors really is sufficiently differentiated – which it may be, but it seems a strange thing to rely on in the absence of better information.
It’s also true that there are a lot more quants and quant funds around now than there were in 2007, across more geographies and styles – and so the institutional memory of the 2007 quantitative crisis has faded a lot. Those of us who were trading in those days are veterans (and we don’t call ourselves “data scientists” either!)
It’s also possible that a behavioral explanation is at work: herding. Just like allocators who pile money into the largest funds despite those funds’ underperformance relative to emerging funds – because nobody can fault them for a decision everyone else has also already made – or like research analysts who only move their forecasts with the crowd to avoid a bold, but potentially wrong, call – perhaps quants prefer to be wrong at the same time as everyone else. Hey, everyone else lost money too, so am I so bad?
This may seem to some managers to be a better outcome than adopting a strategy which is more innovative than using classic quant factors but which has a shorter track record and is potentially harder to explain to an allocator.
In other words, there’s not much Fear of Missing Out, but there should be. The early adopters, many of whom are ExtractAlpha’s clients, are a small but growing bunch. We’d estimate that there are about 20 firms, mostly in the equity stat arb space and mostly in the US, who have the capabilities and resources to evaluate a wide array of raw alterative data sources. That number is gradually growing, but increasingly vendors like us are helping to facilitate broader adoption by making these data sets easier to implement.
Alt data best practices
Alternative data is more than Twitter sentiment and satellites and credit card transactions, the three categories which have attracted the most press. To be successful in finding new alpha in alternative data, one needs to be creative about data sources, and one must sift through a lot of noise. At ExtractAlpha we evaluate many data sets – more than 50 in the last three years – and most are not useful; we need sufficient history and breadth, and of course we need an economic intuition as to why a data set should provide alpha.
Next, one needs to actually find the alpha – no easy task, as anyone in the quant space knows. One must be careful and creative, but parsimonious. A vendor of a new data set needs expertise in how top quant managers evaluate and implement data sets, something most of them lack.
Finally, one needs to make the data actionable. Most managers don’t have the resources or time to devote to finding alpha in raw data sets; they are busy running their portfolios, dealing with client needs, raising assets, handling compliance issues, and maintaining their current processes. As a result, a vendor alternative data solution should be as turnkey as possible.
We do see that many funds have gotten better at reaching out to data providers and working through the evaluation process in terms of vendor management. But most have not become particularly efficient at evaluating the data sets in the sense of finding alpha in them.
New data sets
In our view, any quant manager’s incremental research resources should be applied directly towards acquiring orthogonal signals (and, relatedly, to controlling crowdedness risk) rather than towards refining already highly correlated ones in order to make them possibly slightly less correlated. Finding new data sources should be a prime directive and not an afterthought. Here are eight ideas on how to do so effectively:
- The focus should be on allocating research resources specifically to new data sets, setting a clear time horizon for evaluating each (say, 4-6 weeks), and making a definitive call about the presence or absence of added value from a data set. This requires maintaining a pipeline of new data sets and sticking to a schedule and a process.
- Quants should build a turnkey backtesting environment which can efficiently evaluate new alphas and determine their potential added value to the existing process. There will always be creativity involved in testing data sets, but the more mundane data processing, evaluation, and reporting aspects should be automated to expedite the process in (1)
- An experienced quant should be responsible for evaluating new data sets – someone who has seen a lot of alpha factors before and can think about how the current one might be similar or different. New data sets shouldn’t be a side project, but rather a core competency of any systematic fund.
- Quants should pay attention to innovative data suppliers rather than what’s available from the big players (admittedly, we’re biased on this one!)
- Priority should be given to data sets which are relatively easy to test, in order to expedite one’s exposure to alternative alpha. More complex, raw, or unstructured data sets can indeed get you to more diversification and more unique implementations, but at the cost of sitting on your existing factors for longer – so it’s best to start with some low hanging fruit if you’re new to alternative data
- Quants need to gain comfort with limited history that we often see with alternative data sets. We recognize that with many new data sets one is “making a call” subject to limited historical data. We can’t judge these data sets by the same criteria of 20-year backtests as we can with more traditional factors, both because the older data simply isn’t there and because the world 20 years ago has little bearing on the crowded quant space of today. But the alternative sounds far more risky.
- In sample and out of sample methodologies might have to change to account for the shorter history and evolving quant landscape; here is one approach to the problem.
- Many of the new alphas we find are relatively short horizon compared to their crowded peers; the alpha horizons are often in the 1 day to 2 month range. For large-AUM asset managers who can’t be too nimble, using these faster new alphas in unconventional ways such as trade timing or separate faster-trading books can allow them to move the needle with these data sets. We’ve seen a convergence to the mid-horizon as quants who run lower-Sharpe books look to juice their returns and higher-frequency quants look for capacity, making the need for differentiated mid-horizon alphas even greater.
I haven’t addressed risk and liquidity here, which are two other key considerations when implementing a strategy on new or old data. But for any forward-thinking quant, sourcing unique alpha should be the primary goal, and implementing these steps should help to get them there.