Big Data: Beating the News Cycle


The following is the second in a series of guest articles written by Tony Seker, Executive Vice President of Sales and Marketing at  Brand Loyalties tracks the loyalty of on-line consumers for certain brand names — and how that loyalty shifts over time. The firm gathers and analyzes the data of on-line consumers for the products of over 1,200 publicly traded corporations.

Last week we wrote that hedge funds are in a “data scientist” hiring frenzy, paying millions of dollars to lure experienced data scientists to their firms. We explored two scenarios in which data scientists have offered their employers profitable investment intelligence: 1) using social media to stay ahead of the news cycle; and 2) capturing consumer loyalty to the brand names owned by publicly traded corporations.

The first of those scenarios involves actively scanning, filtering, verifying and authenticating on-line media postings to capture unfolding market or ticker moving events before they become known to the market – or more specifically, before they become known to any competitor with deep enough pockets to render your portfolio actions “too little, too late.”

Unfortunately, it is very difficult to stay minutes, seconds or milliseconds ahead of the data and communications resources owned by competing funds that also possess deep pockets. And when milliseconds count, it is critical to understand exactly where (or when) you have tapped into the media stream flow. Those competing funds may very well have acquired privileged “upstream” access to the same media feeds used by everyone else, including many of the social media research providers.

Perhaps even worse is social media’s own unique set of mine fields:

  • Parsing sentiment from social media postings has always been a problem. Postings often carry double negatives or snarky comments that most human readers know they should not take literally. Even the most sophisticated software has serious challenges with sarcasm.
  • Social media postings are generally not carefully considered or written, nor are they extensively proofed and edited. And they often eventually get deleted.
  • Social media postings also contain more than just words; emoticons, links and hashtags can significantly alter the context of those words. And unfortunately, emoticons are not standardized across social media and hardware platforms.
  • Social media demographics introduce significant biases in the postings. As an example, attempts to project electoral results from social media trends have often landed somewhere between misleading and disastrous.
  • The newest scourge of social media is “fake news.” The importance of vetting postings has never been higher.

Additionally, “Big Data” scientists have a host of other problems that require clever solutions:

  • Big Data is truly big; the amount of on-line data is measured in tens of zettabytes (roughly a terabyte for every person on the planet). And it is growing at an exponential rate – as most of the people on the planet are now making hand-held contributions to social media.
  • As a consequence there is simply no way to tune into all social media 24/7 for actionable investment intelligence. As a practical matter, your coverage is constrained by bandwidth, the size of your server farm and ultimately the clock.
  • In the short term, effective bandwidth fluctuates wildly. In the long term average bandwidth may be growing – but not at the exponential rate seen in social media content.
  • Data appears in multiple languages, as encoded in multiple character sets – including Kanji, Hangul, Arabic, Pashto and simplified Chinese. In fact, the most interesting “staying ahead of the news cycle” postings are probably not something a typical MIT graduate would understand.
  • Most of the information on the web is copyrighted. In-house compliance issues will probably require that all data collection processes abide by international copyright law, including the United States Copyright Act of 1976, 17 U.S.C. § 107.

Each firm seeking to stay ahead of the news cycle needs to ask some simple questions: Are we willing to restructure our energy portfolios based on an unconfirmed Pashto social media posting? Are we willing to liquidate an otherwise favorable portfolio position based on milliseconds of analysis of a snarky Presidential tweet that was subsequently deleted?

In the “staying ahead of the news cycle” scenario, hiring a first class data scientist may be necessary, but it is not sufficient. You also need data and communications resources that can compete with the very deepest pockets on Wall Street, and the guts to act blindly (and perhaps in milliseconds) on whatever turns up.

On Wednesday: ”Big Data”: Updating Peter Lynch for the 21st Century



About Author

Mike Mayhew is one of the leading experts on the investment research industry. In addition to founding Integrity Research, Mike is on the board of directors of Investorside Research Association, the non-profit trade association for the independent research industry, and a frequent speaker on research industry trends and developments. Mike has over thirty years of research industry experience. Email:

Leave A Reply