Back to Archive
Seminar

GOing FAIR & DOing FAIR

Erik Schultes
GO FAIR Foundation

FAIR: Findable, Accessible, Interoperable, and Reusable. The idea of FAIR emerged in 2016 during a "Designing Data Fairport" workshop in Leiden. The workshop formulated FAIR as a series of principles that enacted the ultimate ideal of the internet as an *inter*operable network. It came at a time when people began to recognize that the internet had already far surpassed Robert Kahn's prediction that it would be too complex for human comprehension.

Upon conceptualization, FAIR highlighted the need for humans and machines to partner for data findability, access, and reuse. Humans excel at complex cognitive tasks, while machines are adept at repetitive processes without exhaustion. FAIR aims to leverage machine readability to offload labor-intensive data tasks, enabling machines to efficiently support humans in managing and interpreting vast amounts of information. The collaborative approach aims to enhance overall efficiency in navigating and making sense of extensive datasets.

But what does it take to create this interoperable network? Erik Schultes from the GoFAIR foundation discussed just this in his seminar.

What is FAIR?

At its core, FAIR is a set of principles to help coordinate humans and communities to make data machines actionable. Machine actionable data is a way to make data understandable to machines so they can help future humans understand and evaluate the data. But what is FAIR not?

  • Not a specific standard: FAIR is not a fixed technical standard but a set of flexible principles. This format allows for diverse standards that can achieve machine actionability.
  • Not limited to semantic web or linked open data: While FAIR includes semantic web elements and sees it as a highly developed path to machine actionability, the semantic web is not a prerequisite for FAIR.
  • Not synonymous with open or free: FAIR is not solely about open or free access. Instead, FAIR emphasizes machine readability and accessibility even under controlled conditions.
  • Not directly addressing data quality, trustworthiness, responsibility, or ethics: FAIR doesn't prescribe data quality or ethical considerations. Instead, it acknowledges these as context-dependent factors.

Given this understanding of FAIR - what does it mean to have 'FAIR' data?

Going FAIR:

The FAIR Hourglass represents the community's journey towards machine actionability.

First, we see the structure. At the top and bottom, the wide structure of the hourglass represents the flexibility granted to communities to choose applications tailored to their needs. The constriction at the center underscores the necessity for a common, machine-readable standard that each community communicates with.

Second, we can see a gradient from Blue principles to Red principles. At the top of the hourglass, we see the blue principles. These principles involve social decisions by domain experts and communities that address identifier types, persistence policies, or other domain-specific elements. This collective decision-making ensures that the enacted standards meet community-specific needs. As we move toward the center of the hourglass, we see the gradient switch to purple as the hourglass constricts. This represents how each domain-specific standard can communicate with a global standard, which maintains interoperability and balances local needs and global accessibility in FAIR implementation.

Finally, at the bottom of the FAIR Hourglass lies flexibility encapsulated by the "red principles." These technical principles seek expertise from IT and networking domains. Unlike the community-driven blue principles at the top, the red principles focus on technical elements necessary for FAIR implementation. For instance, engineers may decide what a 'globally unique identifier resolution service' is rather than an academic community. This technical expertise ensures reliable solutions for FAIR implementation, allowing communities to draw upon diverse technical approaches while maintaining the overarching goal of interoperability and machine actionability. The bottom layer thus highlights the adaptability of technological solutions within the FAIR framework.

Doing FAIR

As we exit the hourglass - we end at a point where we have FAIR-compliant machine actionable data.

FAIR-compliant data is fully annotated using metadata that links the data to global standards. This data is fully persistent, can be accessed through queries, and is communicated in the language of the domain communities.

This newly dressed dataset allows us to use data visitation instead of needing to discover, download, understand, and evaluate each dataset individually.

Data visitation is an alternative to centralized approaches in data sharing. First, in data visitation, data remains local, avoiding unnecessary copying. Users send queries to the data, and analysis happens on-site. This reduces the cost of transporting data between servers, instead sending only the instructions for analysis to the data itself. Second, fully FAIR-complaint data allows targeted algorithmic analyses even if the querying research cannot access the data directly. This ensures private data can remain private while still facilitating research. FAIR principles enable these researchers to assemble robust queries that can fluidly aggregate all relevant datasets without local analysis, represent results without requiring data sharing, and paint a robust data story.

Where to from here?

FAIR data principles empower researchers by providing a structured and efficient way to manage and interact with datasets. Instead of manually handling each dataset like individual puppets, FAIR introduces a system where we can use well-defined queries to access and coordinate multiple datasets simultaneously. These robust queries allow different datasets to work together seamlessly, enabling us to derive meaningful insights and tell a cohesive story without the need for manual, time-consuming processes. There are many steps to get here, we need fully persistent identifiers for data as well as research papers like dPID, we need systems of defining how we’ll store data like Data Management Plans, we need tools that build the infrastructure for data visitation like Bacalhau. But by using these services to follow the FAIR principles, we can essentially streamline and enhance our control over data, making it more accessible and versatile for various analyses and applications.

Beyond facilitating data sharing, not having FAIR data costs us. The European Commission estimates it costs us > $10 billion each year for the European Economy alone. Imagine we weren’t spending this money watching data slip through our fingers - and instead we could fund more persistent research discoveries?