The Decentralized Web
We are well on our way to an open data stack for geospatial data. Organizations like NASA, the European Space Agency (ESA), and the United States Geological Survey (USGS) each contribute extensive geospatial datasets. Moreover, they’re open for anyone to use. AWS open data, Earth Explorer, M2M interface, or even sharing data between labs. Many data - and more every year - meet the 'open' criteria.
However, Taylor Oshan's talk discusses why 'open' is insufficient and the gaps we still need to fill.
The Cost of Open Data
Open data costs. It costs to store. It costs to access. It costs to apply. It costs to aggregate. It costs.
These costs are not arbitrary or easy to solve - servers are physical infrastructure and cost money. Because of these costs, allocating computation power to the most relevant research is essential. So, these institutions introduce standards for who can work with the data, applications to document who fits those standards, and bureaucracy to enforce them. Finally - imagine getting your entire lab to use the same file naming standard. Now, scale that up to an entire field spread across the world. Making 'standard' datasets takes work.
These limitations are where the decentralized web can step in.
Professor Taylor sees the Inter-Planetary File System (IPFS) combined with common geospatial data standards like SpatioTemoral Asset Catalogs (STAC) and well-governed data-sharing collaboratives as potential resolutions to some of the limitations of 'open' data.
In the past several years, his lab at the University of Maryland has documented and added around 275 terabytes of geospatial data to IPFS. The data structure of IPFS helps reduce prices, create file redundancy (allowing for persistence), and allows users to store and share cached versions of their data with nearby data users. Upon doing this, he has also annotated this data and associated metadata with each tile using the STAC interface and metadata standard. Through his API, he then takes a tool that standard geospatial researchers are familiar with and gives them access to the metadata and data of all the tiles they have pinned to IPFS without needing to go to AWS. In addition - anyone else can pin data to the Filecoin network in the same interface. Finally - using all of this data, they created a Geo dashboard where researchers can browse the visual data, view the related metadata, and obtain code snippets that help them retrieve the data from the Client set up above.
However, adding this new structure, which aims to resolve the limitations of siloed data and make data more accessible, created a new problem. Decentralized identifiers are bad at spatial data. When you have loaded one tile, a typical identifier has no clue that the tile next door is much more likely to be relevant and needs to search the entire database again to find that information. There are no spatial markers in the identifiers themselves. To resolve this limitation, they used the concept of Decentralized Acyclic Graphs, a core part of the CID system of IPFS. They conceptualized the earth as a series of hierarchical tiles (shown above). Tile B might have another ten tiles inside it, and once again, tile b1 could have tiles a-n in it. This method allows you to consider tiles within the same tile 'tree' as nearer by and thus spatially related, creating easier data traversal.
Another noteworthy project focuses on "data cooperatives" to improve access to hot data. Hot data refers to data that can be accessed on demand. This contrasts with the idea of cold data or archival data which you might be able to access the metadata for and then put in a request to extract the data from a database. Hot data is much more expensive to store currently, however, drawing inspiration from the 'take a penny, leave a penny' philosophy, this initiative proposes a community-driven model where users contribute and access data collaboratively. This might mean that if a researcher has brought data into a ‘hot data’ state if another researcher would like to use it, they can contribute some funds to maintain a hot data node instead of needing to independently retrieve the data (which would cost more). The goal is to create a more equitable and cooperative environment for managing, pinning, and storing geospatial data, fostering a sense of shared responsibility and resource utilization.
Taylor's initiatives represent a transformative shift towards decentralized, community-driven geospatial data solutions. From leveraging IPFS and Filecoin for efficient data storage to exploring novel cooperative models, these projects align with his goal of making open geospatial data accessible and cost-effective.