Data that has previously driven markets, such as earnings releases or governmental economic releases, is being increasingly arbitraged by new alternative data sources, according to a report by JP Morgan’s quantitative research group. The accelerating velocity, volume and variety of alternative data is forcing traditional asset managers to adopt quantitative methodologies.
The massive 280 page report titled “Big Data and AI Strategies: Machine Learning and Alternative Data Approach to Investing” authored by Marko Kolanovic and Rajesh T. Krishnamachari of JP Morgan’s Quantitative and Derivative Strategy team is designed to tutor asset managers in the varieties of alternative data available and the machine learning techniques used to analyze them. The message is also relevant for upstream research providers, if they wish to remain relevant.
Growth of alternative data[1]
According to estimates cited in the report, 90% of the data in the world today has been created in the past two years alone, and just 0.5% of the data produced is currently being analyzed in any form.
The financial industry represents approximately 15% of the $130 billion global spending on big data and its increasing use will be a driver in the forecasted growth of the big data market to $200 billion by 2020. JP Morgan estimates that asset managers are currently spending $2-3 billion on alternative data, including acquiring datasets, implementing big data technology, and hiring talent. Spending is projected to grow 10-20% annually, in line with big data growth in other industries.
As more asset managers use alternative data, pressures on traditional research processes will grow: “As more investors adopt alternative datasets, the market will start reacting faster and will increasingly anticipate traditional or ‘old’ data sources (e.g. quarterly corporate earnings, low frequency macroeconomic data, etc.). This gives an edge to quant managers and those willing to adopt and learn about new datasets and methods. Eventually, ‘old’ datasets will lose most predictive value…”
In JP Morgan’s view, short term trading is already dominated by machines and artificial intelligence technology is progressing to a point where quantitative techniques are encroaching on longer-term investing. “On a medium term investment horizon, machines are becoming increasingly relevant. Machines have the ability to quickly analyze news feeds and tweets, process earnings statements, scrape websites, and trade on these instantaneously. These strategies are already eroding the advantage of fundamental analysts, equity long-short managers and macro investors.”
Types of alternative data
JP Morgan classifies alternative data into three basic categories: data generated by individuals such as social media, product reviews, search trends; data generated by business processes including company exhaust data, commercial transaction, credit card data; and data generated by sensors such as satellite image data, foot and car traffic, ship locations.
Data generated by individuals is typically textual and unstructured. JP Morgan subdivides the category into 1) social media including websites like Twitter, Facebook, LinkedIn; 2) sites containing product reviews such as business-reviewing websites like Yelp, E-commerce groups like Amazon, and Mobile app analytics companies like App Annie; 3) web searches, and personalized data such as Google Search trends, data from personal inboxes.
Data generated by business processes is often structured and can be a leading indicator for corporate financial results. For this reason, this category of data is currently more highly valued than either social media data or sensor data. Exhaust data refers to data that is a by-product of corporate record-keeping such as banking records, supermarket scanner data or supply chain data. Credit card transaction data is one of the most valuable segments as a leading indicator of consumer company revenues. Government data is plentiful but generally less valuable.
Sensor data is typically unstructured and much larger than either individual or process-generated data streams. Satellite imaging is the perhaps the best known example, but geolocation data is increasingly important as it is used to track foot traffic in retail stores. Sensor data will become increasingly important as the Internet of Things (IoT) — embedding micro-processors and networking technology into personal and commercial electronic devices – becomes more widespread.
Data with longer history is typically more valuable to quants making it easier to analyze and adjust for factors such as seasonality or cyclicality. Satellite imagery typically is available for 3 years, sentiment data for five years and credit card data, which can cost up to a million dollars for a full dataset, for at least seven years. However, the histories for all alternative data are continually increasing.
Our Take
Alternative data is one of the most transformative new trends impacting investment research, and the transformation is just beginning. Current research processes revolve around earnings and government releases which are increasingly being anticipated by more timely sources of alternative data. The erosion of existing patterns of research will accelerate as more asset managers begin to use alternative data. The proliferation of alternative data is increasing its relevance to sectors beyond consumer to tech, healthcare, telecom and beyond.
For research providers the challenge is twofold: 1) how to integrate alternative data into research processes and 2) how to commercialize it. The two are not necessarily unrelated. For example, FX strategy firm Exante Data uses its derived China capital flows as an important research input as well as selling the data as a separate offering. Investing in alternative data can not only differentiate the research product, it can be spun off and sold separately. As with the asset management community to which the JP Morgan report is intended, research providers will need to embrace alternative data to remain relevant.
[1] We define alternative data as the intersection of big data and investment research. When applied to investment research, alternative data is the collection, cleansing, packaging, modeling and distribution of large structured and unstructured data sources to generate predictive insights and improved investment returns.