Data Management Planning: How Requirements and Solutions are Beginning to Converge

Effective stewardship of data is a critical precursor to making data FAIR. The goal of this paper is to bring an overview of current state of the art of data management and data stewardship planning solutions (DMP). We begin by arguing why data management is an important vehicle supporting adoption and implementation of the FAIR principles, we describe the background, context and historical development, as well as major driving forces, being research initiatives and funders. Then we provide an overview of the current leading DMP tools in the form of a table presenting the key characteristics. Next, we elaborate on emerging common standards for DMPs, especially the topic of machine-actionable DMPs. As sound DMP is not only a precursor of FAIR data stewardship, but also an integral part of it, we discuss its positioning in the emerging FAIR tools ecosystem. Capacity building and training activities are an important ingredient in the whole effort. Although not being the primary goal of this paper, we touch also the topic of research workforce support, as tools can be just as much effective as their users are competent to use them properly. We conclude by discussing the relations of DMP to FAIR principles, as there are other important connections than just being a precursor.


INTRODUCTION
Effective stewardship of data is a critical precursor to making data FAIR, which is why researchers should develop Data Management Plans (DMPs) from the early stages of the research. It is obviously desirable to share data wherever possible. This requires the necessary permissions to be obtained  (either via consent agreements or in third-party data), the choice of appropriate formats and standards, and rich documentation to ensure data are meaningful to other stakeholders including machines. This paper will cover the convergence in data policy, tools and standards for DMPs, highlighting opportunities to facilitate the planning process and make better use of the information gathered.

Convergence: finding the common ground
DMPs are becoming commonplace across the globe. Expectations have been in place by several UK and US research funders for well over a decade  , and an increasing number of governments, funding agencies and research organisations are releasing expectations that either require or encourage the development of data plans [3].
Despite the fact that many funders have individual templates and guidelines for DMPs, there is considerable alignment in the content. The Digital Curation Centre (DCC) analysed the different UK and international requirements to agree on a set of DMP themes which represented the main aspects addressed [4]. These cover topics such as "data formats", "ethics", "data sharing" and "preservation". The themes are used in DMPonline to allow questions and guidance to be associated. In 2018, the California Digital Library and the Digital Curation Centre revisited this exercise to apply it to the many new US funders templates and to consult with the wider Research Data Management (RDM) community on convergence. This resulted in 14 themes which act as a baseline for expectations and were used as an initial input to inspire the RDA Working Group on Common Standards for DMPs [5]  .
Convergence is becoming more important when projects are executed at multiple institutes and/or paid by multiple different funders. It becomes impractical to create data management plans for each institute and for each funder separately. The DMPRoadmap platform utilised in the DMPonline, DMPTool and other national DMP services addresses this by allowing local requirements to be added to existing funder templates to prevent researchers from having to write one DMP for their funder and another for their institution.
 see the article 20 (p199) in this special issue.  For a history of UK funder policy dating back to 1996, see [1]. In the USA, the National Institutes of Health (NIH) has expected data sharing statements since 2003, see [2].  A comprehensive list of DMP templates is curated by FAIRharing.org, as explained in paper 15 (p151) in this special issue.

Data Management Planning: How Requirements and Solutions are Beginning to Converge
DMPRoadmap [6] is an open source codebase for a Data Management Planning tool, jointly managed by the Digital Curation Centre and The University of California Curation Center (UC3). It represents a large effort to converge on a single solution which married together the best features from earlier versions of DMPonline and DMPTool.
Policy harmonisation can also be a solution to differing requirements and various groups such as UK Research and Innovation, Science Europe, the Belmont Forum, and the Research Data Alliance funders forum are promoting convergence. Science Europe ran a workshop in 2018 to bring together European stakeholders on RDM requirements and DMP. There are two strands of resulting activity: promoting policy harmonisation and agreeing on a common set of DMP requirements which can be extended by domainspecific expectations (DDPs -domain data protocols). Similarly the Belmont Forum, an international consortium of 29+ science funding agencies, has worked closely with data organizations such as RDA, CODATA and science publishers to develop and align its data and digital outputs management requirements template with FAIR and open data best practices. The Belmont Forum is integrating its open data requirements with its online proposal and review processes. Some research funders use the outcomes of DMP reviews to improve their guidelines. The Economic and Social Research Council (ESRC), Belmont Forum, Health Research Board (HRB) in Ireland and ZonMw, a medical science funder in the Netherlands, have already released several iterations of their guidelines for researchers with various adjustments to increase clarity and improve the quality of the responses given. Funders such as ZonMw, HRB and Wellcome Trust are exploring ways to enable easier monitoring of DMPs by providing more structured questions which can be automatically assessed and connecting DMPs to local grants systems  , [7]. Some designated data centres supported by National Environment Research Council (NERC) in UK and National Science Foundation (NSF) in US such as Woods Hole are also discussing with tool providers how DMPs can be aligned with local systems and support processes.

DMP TOOLS
Many DMP tools are available worldwide. The earliest date from 2010-2011, when DMPonline and DMPTool were launched in the UK and USA, respectively [8]. The Digital Curation Centre and California Digital Library which operate these services, converged on a joint open source community-led codebase called DMPRoadmap in 2018. This codebase is used in several international services including DMPAssistant in Canada, DMPTuuli in Finland, DMPOPiDOR in France, PGDonline in Spain and DEIC's version of DMPonline in Denmark. In recent years, many other new DMP tools have been released. Most provide functionality to create, share, export and review DMPs; however different aspects have been emphasised. Several focus on providing closed questions instead of freetext and underlying knowledge-bases to help prompt and guide the users. Others take a project or data set focus, and also link to other tools for data documentation and storage allocation to support implementation. Below is a summary of the main services, highlighting the differences in functionality and approaches to deployment and sustainability.

COMMON STANDARDS FOR DMPS
Given the range of tools emerging and the opportunity to connect DMPs with other research systems and processes, greater coordination and adoption of standards to enable data exchange and interoperability is needed. A request to establish the DMP Common Standards working group [29] was articulated during the 9th Research Data Alliance (RDA) plenary meeting in Barcelona. The discussion was framed by a white paper by Simms et al. on machine-actionable data management plans (maDMPs) [30]. The white paper is based on outputs from the IDCC workshop held in Edinburgh in 2017 that gathered almost 50 participants from Africa, America, Australia, and Europe. It describes eight community use cases which articulate consensus about the need for a common standard for maDMPs (where machine actionable is defined as "information that is structured in a consistent way so that machines, or computers, can be programmed against the structure").
The RDA working group on DMP Common Standards has developed a common model [31] for machineactionable DMPs that enables exchange of information between systems acting on behalf of stakeholders involved in the research life cycle, such as, researchers, funders, repository managers, ICT providers, librarians, etc. The group has also implemented prototype work-flows [32] to demonstrate how the machineactionable DMPs can be implemented by connecting them to various systems, such as CRIS, repositories, or funder systems.
The model is independent of specific funder requirements and provides a common set of concepts needed to represent DMP specific information in a majority of settings. Since it is meant as a format for exchange of information between systems, it is also independent of an internal software architecture adopting the common standard. Furthermore, the model can be serialised into different representations, for example, JSON, XML, OWL. In 2019, the maDMPs were summarised into 10 principles in [33].
As of March 2019, the RDA DMP Common Standards Working Group is documenting the model and is creating examples in JSON to facilitate adoption. It is also engaging with pilot users who wish to deploy the model. Those include tool providers, such as DMPonline, but also universities such as TU Delft. The next steps for DMP Common Standards is to develop further serialisations of the model, reach out to new pilot users and to maintain the model by incorporating feedback from its deployments. All DMP tool providers are encouraged to adopt the common data model to promote interoperability across services and reuse of information held in DMPs.

DMP TOOLS IN THE CONTEXT OF FAIR TOOLS ECOSYSTEM
As data management encompasses the whole life cycle of data, all tools related to achieving FAIRness are relevant, in particular the discovery of existing data sets, evaluating FAIRness and publishing  . Tight

Data Management Planning: How Requirements and Solutions are Beginning to Converge
integration between DMP and FAIR tools is not currently in place, at least not at an appropriate level of adoption and maturity. The first efforts have been formulated in the "The FAIR Funder Pilot Programme  , [7], which will attempt to use DMPs as an indication of the prospective FAIRness of data by connecting FAIR metrics to perform an automated assessment. DMPonline has also integrated the RDA Metadata Standards Directory to help researchers identify and adopt relevant standards and forthcoming tools such as OpenDMP are intended to focus on API integrations to connect different tools in the research system. The DMP Common Standards work lays important cornerstones for interoperability in the tooling ecosystem.
Another dimension of such an ecosystem is internationalisation. Individual DMP tools offer translations to various European and world languages; however FAIR through its focus on machine-actionability and strong semantics starts to show a way to perform language-neutral science -a prospect very interesting also for data management planning with possibilities not yet fully fathomed.

RESEARCH WORKFORCE SUPPORT
Capacity building activities are an important complement to data management tools and guidance. Training resources exist in a variety of formats to help researchers successfully prepare a DMP, and more importantly, to implement these plans. As data needs vary across domains, methods, and objectives of research, flexibility is an important capacity building consideration. The Belmont Forum has developed a Data Skills Curricula Framework to guide research teams and agencies in developing sustainable DMPs, putting together the components of a comprehensive program for data-enabled research, or to be used as a resource to discover the types of training that exist to develop a customised plan. The Framework emphasizes full path data management and role-specific approaches to help teams identify who needs which skills, and when to turn to a data professional for assistance [34].
The RDA/CODATA Research Data Science Schools [35] which have been running since 2016 to equip early career researchers with core data handling, visualisation, computational infrastructure and open science skills, are now exploring parallel data stewardship courses supported by the FAIRsFAIR project [36] where researchers and data stewards will be co-taught. The curriculum will promote collaboration and the range of skills and inputs needed to effectively manage and share FAIR data. Resources such as the Data Management Training Clearinghouse, the Belmont Forum e-Infrastructures and Data Management Toolkit and the FOSTER Open Science portal have compiled globally-oriented free online training materials, many of which have been collaboratively developed in line with the principles of FAIR and Open Educational Resources.
In ELIXIR, the "Towards Data Stewardship in ELIXIR: Training & Portal" Implementation Study running in years 2018-2019 has brought advancements both in Data Stewardship Wizard tool, as well as the training materials on data stewardship curriculum.

CONCLUSIONS AND DISCUSSION
The following points summarise the relation between the FAIR principles and data management planning and the respective tool.
1. A sound and elaborate data management plan is a necessary precursor to making data FAIR. 2. DMP tools can help becoming more FAIR by giving a feedback -FAIR metrics calculations indicating the assumed FAIRness of data if the DMP is followed (e.g. Data Stewardship Wizard [17]). 3. DMPs can themselves be FAIR -The RDA DMP Common Standards Working Group has delivered a prototype model which should now be implemented by the growing number of DMP tool providers. If we have a standard expression for the content within DMPs, we will be able to exchange information more easily with the wider ecosystem of FAIR services and provide greater value to researchers, funders and service providers using DMPs. This covers "I" and "R". "F" and "A" still need to be achieved by storing DMPs into suitable repositories, which does not happen in general, yet. A question is, of course, what information may be needed to be FAIR and what can be shared (DMPs tend to be inherently confidential) -a decision the community has to have, yet.
Convergence in research data policy and the adoption of standards for DMP tools is desirable to help clarify the landscape and enable interoperability. We are unlikely to achieve a single DMP template or one tool, and neither is that desirable. DMPs should be tailored to the context in which research is conducted. For some ethical issues will be prevalent, while for others it will be challenges of scale or timely data sharing. Also, even in the same human language, different research disciplines may use different words to express data management topics; using the right terminology is critical in order to facilitate usage of the tools and templates.

AUTHOR CONTRIBUTIONS
Sarah Jones (sarah.jones@glasgow.ac.uk) is the main author of the contents. Robert Pergl (perglr@fit.cvut. cz) was coordinating the authors team and authoring process, copy-edited the text and authored Data Stewardship Wizard details together with Rob Hooft. Tomasz Miksa (tmiksa@sba-research.org) authored the information about RDA Working Group on Common Standards for DMPs and machine-actionable DMPs. R. Samors (miksa@ifs.tuwien.ac.at), J. Ungvari (jungvari@gmail.com), R. Davis (rowenaidavis@email. arizona.edu) and T. Lee (tinal@email.arizona.edu) collectively wrote and revised information in Section 5 on the Belmont Forum's research workforce support activities and resources and made additional contributions to information in the remainder of the article.