Big Data and Investment Research: Part 1


One of the key themes in the institutional investment research business this year is the growing importance of warehousing and generating meaningful signals and insights from “big data”.  This week we will discuss the impact this trend is starting to have on the internal investment research process at buy-side firms and next week we will write about how these developments will transform the sell-side and independent research business.

So What is Big Data?

The term “big data” when applied to investment research is one which is used by countless journalists, vendors, and consultants, though few really agree on a definition.  We see “big data” as the regular collection, storage and analysis of numerous structured and unstructured data sources to generate predictive insights and consistent investment returns.

One of the first real “big data” firms to serve the U.S. institutional investor was Majestic Research, founded in 2002 by Seth Goldstein and Tony Berkman.  Majestic Research entered into exclusive licenses with proprietary third-party data providers and they obtained data freely from the web enabling them to generate data driven insights on sales trends in industries such as telecommunications, real estate airlines.  The firm was sold to ITG in 2010 for $56 mln.


As you can see from the well-known IBM slide above, many analysts break “big data” down into four dimensions – Volume, Velocity, Variety, and Veracity.  However, the team at Integrity Research believes that one additional “V”, Validity should also be added when looking at “big data” from an institutional investor’s investment perspective.

Applying the 5 V’s to the Buy-Side

As you might guess, Integrity Research has interacted with many buy-side investors who either have already started a “big data” initiative or who are considering implementing one in the near-term.  So, what issues should buy-side investors be aware of as they plan to develop a “big data” effort to enhance their current investment research processes?

Volume: Consistent with the term “big data” one of the obvious characteristics of any big data initiative is the volume of data that investors must be prepared to collect and analyze.  As you can see from the slide above, 2.3 trillion gigabytes of data are created every day, with estimates suggesting that 43 trillion gigabytes of data will be created by 2020.  Consequently, buy-side investors looking to develop a big data strategy must be prepared to warehouse and analyze huge amounts of data – considerably more than they have ever worked with in the past.

Velocity: Not only is the volume of data huge, but most big data initiatives require that investors analyze this data in real-time to identify meaningful signals.  Fortunately, most buy-side investors are used to working with real-time data.

Variety: One of the key characteristics of “big data” initiatives is the variety of data types that buy-side investors can collect, including both structured and unstructured data.  A few of the major external data types that we have identified for buy-side clients include data from public sources, social media sites, crowd sourcing efforts, various transaction types, sensors, commercial industry sources, primary research vendors, exchanges and market data vendors.

Veracity: All investors understand the problem of poor data quality when trying to build a reliable research process.  Clearly, this becomes an exponentially more difficult issue for buy-side investors as they try to identify and ingest terabytes of data from numerous public and private sources, all who have different data collection and cleansing processes.  Consequently, investors often have to implement sophisticated data quality checks to make sure that the data they warehouse is reasonably accurate.

Validity: One important concern for buy-side investors when deciding what data they want to acquire and or collect is whether this data is actually useful in helping predict the movement of securities or asset prices.  Warehousing irrelevant data only increases cost and complexity without contributing value to the research process.  Consequently, buy-side investors need to clearly think through the potential validity of a dataset before it is acquired.

Big Data Benefits for the Buy Side

So why are buy-side investors starting to jump on the big data bandwagon?  Is it a fad, or is it a long-term trend for institutional investors?  In our mind, the adoption of “big data” methods for investing is merely the next logical step for investors looking to create a way to generate consistent returns.

One of the most obvious benefits of rolling out a “big data” initiative is to enable investors to create a systematic repeatable research process versus an investment process which is overly reliant on specific individuals.  Clearly, this has been the benefit of quantitative investment models used by some asset managers for years.  What is really interesting is the fact that a number of traditionally qualitative investors are now looking into “big data” techniques to add this as an overlay to their primary investment strategy.

A related benefit of implementing a “big data” project is the ability for buy-side investors to develop proprietary predictive signals from the data they are warehousing for individual stocks, sectors, ETFs or other market indices which can help generate consistent returns.  In fact, an investor’s ability to develop predictive signals is often only limited by their ingenuity in either finding existing datasets, their willingness and skill in building new proprietary datasets, and their creativity in analyzing this data.

Adopting a “big data” driven research process should also lower the research risk for institutional investors than for many traditional primary research driven approaches.  Clearly, and investor is unlikely to receive and trade on illicit inside information when using “big data” techniques.

Costs of Implementing Buy-Side Big Data Initiatives

Of course, adopting a “big data” research program is not without significant costs for a buy-side firm.  A few of these costs include:

Obviously, firms that have never implemented a significant quantitative investment strategy are likely not to have the expertise or knowledge to effectively implement a “big data” program.  This includes the expertise in finding unique data sources, technically integrating these data sources, cleaning / validating this data, warehousing huge volumes of data, analyzing this data and developing predictive signals, etc.  Consequently, buy-side firms looking to build “big data” initiatives will be forced to hire different types of talent than they ever have had to hire before.  Some of these professionals will need data integration skills, advanced analytics and predictive analysis skills, complex event processing skills, rule management skills, and experience with business intelligence tools.  Unfortunately, the current supply of high quality data scientists is considerably smaller than the exploding demand for their skills.

Hiring workers with these new skill sets is also likely to create a different issue for buy-side firms, and this is a management and corporate culture issue.  Clearly, these new employees will often need to be managed differently than their peers given their skills, experiences and personalities.  Consequently, finding managers who can effectively manage and motivate these new employees will be critical in recruiting, developing and keeping this talent.

Of course, one of the most significant costs of implementing a “big data” initiative at a buy-side firm is the upfront and ongoing financial investment required to be successful.  Not only does the firm have to hire the right talent (discussed previously), but they also have to acquire and/or build the right technical infrastructure, and they need to identify and acquire the right data.  In some instances, buy-side firms also need to invest in “creating” unique proprietary time series (e.g. by conducting longitudinal surveys or employing other primary research techniques) which will also require specialized know-how and a significant financial investment.

Alternatives to Building Big Data Programs In-House

Does this mean that only the largest buy-side firms have the management, technical or financial resources to successfully implement a “big data” program?  Well, the answer is yes and no.  If a buy-side firm wants to build this type of program in-house, then it will take considerable resources to pull off.  However, if a buy-side firm is willing to outsource some of this initiative to other partners, then it is possible to build a “big data” program more cost effectively.

In fact, there are a growing number of vendors who can provide buy-side investors with various components of a “big data” research program such as sourcing and aggregating unique data sets, leveraging a data warehouse for both external and proprietary data, and even building custom signals for investors.

One such vendor is DISCERN, a San Francisco-based research firm which collects large amounts of publicly available information.  The firm was founded in 2010 by former Lehman Brothers IT analyst Harry Blount.  Besides producing data-driven research, DISCERN has also leveraged cloud-based machine-learning algorithms and persistent queries to automate data aggregation, enhance data discovery and visualization, signal decision-makers about new business developments, and deliver masses of disparate data in a manageable format.

In addition to DISCERN, a number of new vendors have sprung up in recent years providing asset managers with aggregated sources of structured and unstructured public data, access to proprietary industry or transaction data covering various sectors, or investment focused signals based on various social media sources.  


As we mentioned earlier, adopting a “big data” research strategy is not without its own issues and related costs for any buy-side investor considering this course of action, including deciding whether to “build or buy”, acquiring the relevant know-how, finding appropriate management, and resourcing the project sufficiently to attain success.

Despite these issues, the use of “big data” research techniques is likely to transform a significant part of the investment community in the coming years as the buy-side looks to implement a research process which produces repeatable and consistent returns, and which does so at a lower risk than traditional approaches.

In our minds, one of the most exciting aspects of this trend is discovering what new data vendors, infrastructure providers, and analytic tool suppliers spring up to meet the growing buy-side demand to more easily and cost effectively adopt “big data” as an integral part of their research process.



About Author

Leave A Reply