A Guide to Getting Started with Open Data


The following guest article was written by Mikheil Shengelia, Research Analyst, Data Strategy at Eagle Alpha, an alternative data aggregation platform that also provides supporting advisory services for data buyers and vendors.

One type of alternative data available to all data users, and one of the 16 data categories included in Eagle Alpha’s recently updated taxonomy, is open data.  Open data is data that can be freely accessed, used, modified, and shared by anyone.

Introduction to Open Data

One example of alternative data available to all data users, and one of the 16 updated data categories on Eagle Alpha’s taxonomy, is open data. Open data is data that can be freely accessed, used, modified, and shared by anyone. The open data movement can be traced back to the concept of open access to scientific data and the formation of the World Data Center system in 1955. The Human Genome Project is one of the most famous examples of open data in action as it was built upon Bermuda Principles stating that all human genomic sequence information “should be freely available and in the public domain in order to encourage research and development and to maximize its benefit to society.”

According to McKinsey, economies could see GDP gains of between 1 and 5 percent by 2030 if they adopt open data for finance. However, there are several challenges associated with the open data movement. When data is made openly available, it can be difficult to ensure its quality and accuracy or that it has been collected consistently and unbiasedly. Creating the infrastructure to support the storage and sharing of open data presents another challenge.

Following the open data movement rise, large volumes of open government data were made available through a variety of portals and repositories. Governments and international organizations anticipate that allowing access to their data would help to provide transparency, accountability, and value creation for several groups:

  • Citizens – open data provides immediate access to the information which belongs to them, reinforcing the transparency vision. For example, open data can be used to track the spending of public funds, monitor the performance of government agencies, and discover patterns of corruption.
  • Governmental institutions – the use of open data can help them become more transparent, efficient, and effective, also reinforcing their public service role.
  • Business – may reuse open data to create applications, platforms, or other data products. For example, open data has been used to improve weather forecasting, create better transportation apps, and improve the efficiency of energy production.
  • Other sectors – such as journalism, university research, and non-profit organizations. For example, NGOs can use open datasets to better execute development targets.

We can broadly divide open data into two subcategories: 1) public data and 2) open-source data.

Public datasets are collected and maintained by various government agencies or international organizations and made publicly available for anyone to access, use, and republish without restrictions. Examples include data on government spending, crime statistics, weather forecasts, transportation schedules, health statistics, etc. 

The EU’s Open Data Directive, adopted in 2019, is an important piece of legislation in this regard as it aims to make data produced by public sector bodies more easily accessible and reusable for commercial and non-commercial purposes. One of the key aspects of the directive is the requirement for public sector bodies to make their data available in machine-readable formats. This means that the data must be structured in a way that can be easily processed by computers, making it easier for businesses and individuals to use and reuse the data.

Data.gov is an online portal created by the US government that provides access to a wide range of open datasets from federal, state, and local government agencies. Data.gov.uk is a similar portal created by the UK government and the EU has its own portal with over 1.5 million public sector datasets available. The World Bank’s and the OECD’s open data portals provide access to development indicators such as GDP, population, education, unemployment rates, and health. 

Open-source datasets are made available under a permissive license, such as a Creative Commons license, that allows anyone to use, republish, and modify the data without having to ask for permission. This encourages collaboration, experimentation, and innovation by making data widely accessible. Examples include Common Crawl, a repository of web-crawled data, and ImageNet, a large-scale image dataset for training machine learning models. 

Open Data Used by Other Alternative Data Vendors

It is important to note that vendors from other alternative data categories also actively use open data sources to improve their product offerings. Economic trade data vendors compile monthly import and export figures from customs offices and statistical research organizations while patent data vendors aggregate information from authorities across the world to get a perspective on innovation activities.

Unstructured open-text data is another popular source for vendors developing their data products. SEC’s EDGAR (Electronic Data Gathering, Analysis, and Retrieval) dataset, for example, is quite popular among vendors profiled on Eagle Alpha’s platform while others would use natural language processing techniques to parse through SEC filings, extract relevant passages that contain discussions of company events, and compute sentiment models. 

The use of linked data and the semantic web, which allows data to be linked and shared across different databases and platforms, is becoming more widespread. This makes it easier for organizations to discover, share, and use open data and facilitates the creation of new applications and services. Data.world, for example, has over 125K open datasets contributed by thousands of users and organizations across the world. 

For more information on Eagle Alpha’s new alternative data taxonomy and for access to their new industry reports, get in touch for a demo here.

Impact of COVID-19

COVID-19 became a catalyst for many organizations to promote data sharing and make their data openly available. Similarweb, a web traffic data provider, made its COVID-19 Travel Index openly available to see how businesses in the travel industry were recovering after unprecedented changes due to the pandemic.

Figure 1 below shows the recovery index for international bookings. In January 2022, shares of international bookings were the highest they have been throughout the pandemic for Europe, the UK, Canada, the US, Australia, and New Zealand. This was due to relaxed restrictions and promises of border reopenings.

Figure 1: Global Accommodation Origin Bookings, Share of International Travel (Source: Similarweb)

Open-Source Satellite Imagery

Satellite imagery can be used for agricultural monitoring in a variety of ways. One common use is to assess crop health and yield potential by analyzing the normalized difference vegetation index (NDVI). NDVI is a measure of the greenness of vegetation and can be used to identify areas of healthy vegetation as well as areas that may be suffering from stress due to factors such as drought or disease.

Figure 2 below shows images from Sentinel-3 available via Copernicus, the European Union’s Earth Observation Programme. A severe drought is currently being monitored in South America as a result of  La Niña climate phenomenon. Predictions in Argentina point to a potential harvest decrease by 50% in 2023.

Figure 2: Severe Drought in South America (Source: Copernicus)

Satellite imagery can also be used to detect and monitor changes in land use. For example, it can be used to identify new fields brought into cultivation or monitor the expansion of urban areas into agricultural lands.

Following the release of Eagle Alpha’s updated alternative data taxonomy, the company has begun rolling out updated reports highlighting recent events, intelligence, case studies, and data provider comparisons for the wider alternative data community to review. For more information on Eagle Alpha’s new alternative data taxonomy and for access to their new industry reports, get in touch for a demo here.


About Author

Mike Mayhew is one of the leading experts on the investment research industry. In addition to founding Integrity Research, Mike is on the board of directors of Investorside Research Association, the non-profit trade association for the independent research industry, and a frequent speaker on research industry trends and developments. Mike has over thirty years of research industry experience. Email: Michael.Mayhew@integrity-research.com

Leave A Reply