Considerations for the Conduction and Interpretation of FAIRness Evaluations

The FAIR principles were received with broad acceptance in several scientific communities. However, there is still some degree of uncertainty on how they should be implemented. Several self-report questionnaires have been proposed to assess the implementation of the FAIR principles. Moreover, the FAIRmetrics group released 14, general-purpose maturity for representing FAIRness. Initially, these metrics were conducted as open-answer questionnaires. Recently, these metrics have been implemented into a software that can automatically harvest metadata from metadata providers and generate a principle-specific FAIRness evaluation. With so many different approaches for FAIRness evaluations, we believe that further clarification on their limitations and advantages, as well as on their interpretation and interplay should be considered.


INTRODUCTION
The FAIR (Findable, Accessible, Interoperable and Reusable) guiding principles for scientific data management were published in 2016 and quickly got popular among the scientific community, being cited more than 600 times in scholar publications [1]. Moreover, the implementation of the FAIR principles has become a prerequisite by many funders [2,3], and that demonstrates how popular the principles have gotten. Nonetheless, their implementation is still not fully understood by the research community [4].
The original FAIR publication does not exemplify the implementation of the principles. Instead, authors suggested scientific communities interested in enhancing the reuse of data should engage on defining which implementations are considered to be FAIR [5,6]. For this reason, there is a need to objectively evaluate the "FAIRness" of research resources is highly warranted [4].
Recently, several tools were developed for conducting FAIRness evaluations. These tools come in different forms, such as questionnaires [6][7][8][9][10], checklists [11,12] and as a semi-automated evaluator [13]. A detailed analysis of each of the existing FAIR assessment tools was recently published by the FAIR data maturity model workgroup of the Research Data Alliance (RDA) [14]. Moreover, the "FAIRsharing" group has released the "FAIRassist.org" website, where a summary of these different tools according to execution type, key features, organization, target objects and whether they include reading materials [15].
Despite these recent efforts, we believe that further clarification can aid researchers on how to conduct and interpret FAIRness evaluations. Therefore, the objectives of this manuscript are to (I) briefly describe the different types FAIRness evaluations; (II) summarize the advantages and limitations of each type; (III) provide considerations on how to use and interpret FAIRness evaluations.

TYPES FAIRNESS EVALUATION
In FAIRassist.org [15], four categories are used to describe the type that a certain tool is executed: manual -questionnaire; manual -checklist; semi-manual and automated. For the sake of simplicity, we will collapse both "manual-questionnaire" and "manual-checklist" as a single category ("discrete-answer questionnairebased evaluations").

Discrete-answer questionnaire-based evaluations
This approach consists of online self-report questionnaires and checklists where respondents need to indicate their implementation choice (for a given principle) from a predefined set of answers. Questions aim to reflect each of the principles, or some additional related concept that is not FAIR (e.g. data quality or openness). Questions are grouped according to each principle (i.e., F,A,I,R). The output of the evaluation is a sum score [8,16], a weighted sum [12] score or a visual score [7] that is automatically generated after the completion of the evaluation.

Considerations for the Conduction and Interpretation of FAIRness Evaluations
Discrete-answer questionnaire-based tools may present some limitations; (I) the set of answers is not extensible, meaning that is not possible to include additional standards; (II) items do not provide evidence of a FAIR implementation, but only intention; (III) score computation is arbitrary. Nonetheless, these approaches have shown to be particularly useful as educational resources for respondents with little knowledge on FAIR. Moreover, these questionnaires could also potentially be adapted for testing knowledge on data stewardship. Studies assessing the psychometric properties of these questionnaires are warranted.

Open-answer questionnaire-based evaluation
Open-answer questionnaires for evaluating FAIRness were first introduced by the FAIRmetrics focus group (a subset of the authors of the original FAIR paper) assessed using a spreadsheet [6,17]. Authors (of the FAIRmetrics group) published a design framework and exemplar metrics for representing FAIRness of digital resources [6]. It consists of 14 "general-purpose" metrics reflecting the FAIR principles and their subparts . Later on, the authors decided to rename the metrics to FAIR "maturity indicators", as the evaluations are supposed to indicate specific points that can be improved, rather than "measure" FAIRness.
Open-answer questionnaires for FAIRness evaluations have similar limitations to discrete answer category questionnaires; being time-consuming and subject to respondent bias [13]. Another important limitation is the lack of instructional material to aid respondents on conducting the evaluations. Respondents need to dig into many different resources to be able to attempt providing answers. Moreover, the psychometric properties of these tools has also not been evaluated, which compromises its validity and reliability, and therefore any attempts to compute total scores should be regarded as experimental.
The advantages of this approach are (I) answers need to include a statement that evidences the FAIR implementation (e.g. a URI of a metadata record of the adopted standard). This helps to ensure the FAIRness of the standards used by the resources; (II) it allows scientific communities to create additional maturity indicators; (III) can be filled in multiple occasions (spreadsheet version).

Semi-automated evaluation
The "FAIR Maturity Evaluation Service" approach consists of a web-based metadata harvester [13,18], that parses metadata of a digital resource assigned to a global unique identifier (GUID). The respondents needs to enter: (I) a brief description of the resource to be evaluated; (II) the GUID of the resource and (III) their ORCID [19]. Moreover, a "collection" of maturity indicators must be selected prior to conducting the evaluation [13]. A collection is basically a group of maturity indicators tests; and should compiled to address the needs of the resource/scientific community engaged in performing the evaluations. For instance, could be only interested in "Findability" and "Accessibility" at the initial stage of a funded project. Additionally, a short description of the resource being evaluated must be provided. Descriptions must include a minimum of 40 words; preferably including the title of the resource, the metadata provider, the collection used, a short description and the ORCID code of the respondent must be provided.

Considerations for the Conduction and Interpretation of FAIRness Evaluations
The semi-automated approach has a number of advantages such as (I) it is automated, which is supposed to reduce respondent bias; (II) it is open-source; (III) it ensures transparency as the evaluations are open and the evaluators identifiable; (IV). However, the semi-automated approach also has some important limitations: (I) (II) requires that the resource has some kind of metadata provider available (i.e., and therefore is not useful for projects in developmental phase); (III) it depends on the compatibility between software and metadata provider; (IV) it performs differently when comparing two different identifiers of the same resource. Nonetheless, potential users of the FAIR Maturity Evaluation Service are welcome to test and provide feedback to the development team.

What does the outcome of a FAIRness evaluation tells?
FAIRness can be interpreted as "a continuum of behaviors" exhibited by a resource that can lead to machine discoverability and (re)use [5]. Therefore, "absolute FAIRness" is conceptually not achievable (and will probably never be). Nonetheless, some tools for FAIRness evaluations attempt to provide overall scores. However, besides not being a validated calculation, the FAIRmetrics group opposed to the idea of computing total scores. Regardless of the methodology for computing overall sum scores, different digital objects would need different score computation. Therefore, overall scores should not be used for evaluative purposes.
In opposition to overall scores, the assessment of the specific maturity indicators can provide a starting point to define implementation choices that are missing for that resource. Moreover, the output of each maturity indicator test also reflects whether the standard has been properly stated on resource's metadata or if it still needs to be implemented to the code of the FAIR maturity service. These outputs can be useful for data stewards internally appraising the resource they represent.
IT should be taken into account that there are many possible reasons explaining a failed FAIR maturity test. The following questions should be raised: does the metadata about the resource contains a statement that addresses the principle failing a maturity test? Does the metadata scheme covers that principle? Is that principle relevant for that resource type/scientific community to which it belongs? Most importantly, however, does the semi-automated evaluation software supports that standard? The answers to these fundamental questions should aid evaluators to understand why their resource failed a test, and further tailoring a strategy to address it.

What type of FAIRness evaluation should I use?
If the interest is solely evaluating the FAIRness of the resource for internal appraisal, then the recommendation is to start with a semi-automated evaluation. However, it should be taken into account that the output of the evaluation might not reflect the needs of your scientific community. Although the existing FAIR maturity indicators are supposedly going to meet the needs of many communities, some reasoning is needed prior to using the semi-automated approach for official purposes.

Considerations for the Conduction and Interpretation of FAIRness Evaluations
Different approaches should be complementary, rather than concurrent. The outcome of open-answer questionnaires can be useful to capture information on standards that still need to be implemented to the automated maturity tests. Moreover, as demonstrated on the FAIR principles explained, and GO-FAIR matrix, questionnaires can be used for gathering valuable data on what are the implementation choices for that given scientific community. The existing maturity indicators shall work for many researchers, however, If scientific communities want to go further, and develop their own maturity tests, then it should be their role to do so.

Resource scored positive using questionnaire, but negative on the automated evaluation; what should I do?
Researchers are genuinely more interested in assessing their own digital objects, rather than those from others. However, it is important that the community ensures that the standards used are also FAIR. Therefore, members of scientific communities should consider identifying whether the standards adopted by their community (i.e., identifiers, formats, protocols, policies, etc.) comply with the FAIR principles. Therefore, metadata about the standards is crucial for ensuring their FAIRness.
FAIRsharing (www.fairsharing.org) [20], Identifiers.org (www.identifiers.org) [21] and Bioportal (www. bioportal.org) [22] are examples of metadata registry for metadata standards, including file formats, ontologies, identifier schemas, as well as for maturity Indicators. Therefore, communities should ensure that the standards they select as best practices for FAIRness must also contain proper metadata.

CONCLUSION
The present manuscript provided a short description on different types of FAIRness evaluations, pointing to its advantages and limitations. Moreover, reflections on three fundamental points were provided: (I) evaluations should be assessed at the maturity indicator level, rather than at the overall level of FAIRness; (II) the open-answer questionnaire approach should be used complementary to the automated approach, rather than concurrently; data stewards should figure out what leads to inconsistencies between both approaches; (III) communities should take the role on implementing standards to the semi-automated evaluator. Standards must also be FAIR (and therefore, have an identifier, metadata descriptors, etc).

AUTHOR CONTRIBUTIONS
R.de Miranda Azevedo (r.demirandaazevedo@maastrichuniversity.nl) took the lead in writing the manuscript and M. Dumontier (michel.dumontier@maastrichuniversity.nl) supervised and reviewed the manuscript.