Alternative Data Firm Thinknum Launches Reddit Dataset

By Sanford Bragg February 2, 2021

In response to Reddit-fueled retail stock trading, Thinknum Alternative Data, a leading source of web data, released a database that allows investment firms to track companies mentioned on Reddit.  The ease with which Thinknum — and others — create new datasets illustrates why web data is the most ubiquitous form of alternative data.

New Reddit Dataset  

Thinknum added a new dataset that tracks the number of times NYSE and NASDAQ tickers are mentioned in the top 100 posts on r/WallStreetBets and r/Stocks in real time. Data is indexed daily, and accessible via API or Thinknum’s cloud-based analytics.  The dataset went live last Wednesday.

“Demand has been massive,” said co-founder Justin Zhen in an interview.  “We received over 50 inbound requests from hedge funds in one week alone.”Zhen said client interest comes from two main use cases. Fund managers are hoping to generate alpha by finding leading indicators of future stock prices, as with other alternative datasets. The second use case is risk management, as institutional investors look for an insurance policy to protect themselves from Reddit-induced stock volatility.

Thinknum’s Products

Thinknum offers 35 datasets collected through web scraping and covers approximately 450,000 public and private companies. The platform focuses on organizing public data such as government contracts, product pricing data, store locations, as well as extensive social media data from sources such as LinkedIn, Twitter, Facebook and Instagram.  For many of the datasets, historical data goes back to 2015, depending on when Thinknum began collecting the data.

After Jumpshot was shut down by its parent Avast because of privacy concerns, Thinknum launched its own web traffic dataset in September 2020 collected from public data.  It launched a knowledge graph tool in October 2019 and has increased its global coverage.  


Thinknum says it has over 300 firms as clients.  It primarily serves hedge funds, but has been expanding sell side usage.  Eight of the top ten investment banks are reportedly clients, and the company has been expanding within equity research, sales/trading, and investment banking, as well as the asset management units of banks.  The company has also been adding corporates as clients. 

The company originated in 2014 as a financial model sharing site, similar in concept to GitHub which allows programmers to share open source code.  Co-founder Justin Zhen was previously a hedge fund analyst and co-founder Gregory Ugwi worked as a strategist at Goldman Sachs.  The firm pivoted its product focus in 2015 from model sharing to gathering alternative data because hedge fund clients were asking for alternative data inputs for their models.

Thinknum raised $11.6 million in a Series A funding in March 2019.  Beginning in 2018, Thinknum hired journalists including a former Wall Street Journal reporter to develop media articles based on data collected on its platform.  In September 2018, the company began distributing a subset of its data through Citi’s capital markets portal, Velocity.

The firm is currently at 40 employees, primarily based in its NY office.  LinkedIn registrations have grown 34% over the past year.  The firm is currently recruiting for 5 full-time positions.

Our Take

Boutique research firm Wolfe Research, which wooed Deutsche Bank’s star quant Yin Luo as Vice Chairman in 2016, also released a Reddit dataset highlighting stocks targeted by retail traders.  Luo said his firm used natural language processing to identify stocks mentioned on subreddit forum WallStreetBets.  A portfolio made up of highly shorted stocks touted on Reddit was up 150% in the past two months, according to Wolfe’s data.

Upstart web data provider Quiver Quantitative, founded last year by two college students, also has released a Reddit database focusing on investments mentioned on WallStreetBets,

Web data remains one of the most popular forms of alternative data and Thinknum has been nimble in developing new datasets as opportunities arise. Web-scraped data is also the most common type of alternative data according to analysis performed by Neudata last year.  The ability of Thinknum (and others) to quickly launch new datasets illustrates why.     

Related Articles

  1. Study: Web Scrapers Represent a Quarter of Internet Traffic (15)
  2. Appeals Court Rules that Web Scraping Bots May Be Illegal (15)
  3. Alternative Data Platform Thinknum Adds Web Traffic Data (15)
  4. HiQ Asks Supreme Court to Let Landmark Web Scraping Ruling Stand (12)
  5. Snowflake Joins the Alternative Data Crowd, With a Twist (12)
  6. Expert Network Guidepoint Remains Committed to Alternative Data While Investing Heavily Elsewhere (12)
  7. Yodlee’s Privacy Woes Deepen (12)