NEW YORK – At Integrity, our database of thousands of research providers and industry experts provides the foundation for our analysis. However, on occasion, people will wonder why we go to all the trouble of collecting, organizing, and analyzing this information when one can easily search for sources of information using Google. At a panel discussion on specialized search tools during the recent Investorside conference, Penny Herscher, CEO of FirstRain, noted that she has also had to overcome similar skepticism.
Given that Google is, in all likelihood, the primary information-finding tool in the rest of our lives, it is perfectly understandable that analysts might turn to it first when doing investment research. But Google is often insufficient to find anything of value for financial research.
The most common problem cited with Google is that its broad coverage universe tends to turn up irrelevant hits. This is perhaps the easiest problem to remedy: one can learn to craft search queries more carefully, by using fewer ambiguous words; it is also possible to instruct Google to exclude sites that are most likely to be sources of noise – for example, adding “-wikipedia” to your search string will remove Wikipedia pages from your search results, while the CustomizeGoogle extension for Firefox lets you exclude a range of sites from all of your searches. A related complaint is that Google searches will exclude information contained behind password-protected subscriber walls. But this is unfair. One can hardly blame Google for failing to violate the intellectual property rights of content owners.
The more serious reason why Google is insufficient for financial research has to do with its sorting algorithm. As an experiment, I recently created a custom Google search engine, using a list of several thousand web sites from our internal database, sites that we had judged to be sources of valuable research and information. By doing this, we effectively eliminated the complaint that Google’s broad search universe would turn up irrelevant hits. One would have expected that allowing Google to, essentially, search the same network of sources as is present in our internal database would have yielded excellent results. This list included a large number of unique sources of in-depth research and analysis that we have collected over a number of years. As it happens, it also included a much smaller number of sites that are pretty widely known – bulge-bracket investment banks and research sources that have a more retail focus.
Conducting a Google Search on this restricted universe revealed the real problem with Google. The search results were ranked in an order that had nothing whatsoever to do with the depth of analysis or quality of information present on the sites. The ranking was based purely on popularity. Here is Google’s description of how this PageRank algorithm works:
Democracy on the web works.
Google works because it relies on the millions of individuals posting websites to determine which other sites offer content of value. Instead of relying on a group of editors or solely on the frequency with which certain terms appear, Google ranks every web page using a breakthrough technique called PageRank™. PageRank evaluates all of the sites linking to a web page and assigns them a value, based in part on the sites linking to them. By analyzing the full structure of the web, Google is able to determine which sites have been “voted” the best sources of information by those most interested in the information they offer. This technique actually improves as the web gets bigger, as each new site is another point of information and another vote to be counted.
These democratic principles work fine for consumer needs. But, in our test searches, the small number of widely known, retail-oriented research sources completely dominated the search results, while thousands of less well-known sites with deep information were nowhere to be found in the first few results pages. PageRank is an algorithm that assumes that the more widely-known and popular something is, the more valuable it must be. This, of course, is the exact opposite of how institutional investors think about information – as something becomes more widely known, it becomes increasingly less likely to yield any sort of alpha… in fact, I believe I am on safe ground when I say that there will never be any kind of actionable and profitable intelligence on the front page of Yahoo! Finance.
Institutional investors need to be elitist and anti-democratic when searching for information. The sources of most value to them are very rarely going to do well on Google’s PageRank algorithm. This is why alternative search engines and proprietary databases are necessary if one is to find any kind of informational advantage.