Licensing FAIR Data for Reuse

The last letter of the FAIR acronym stands for Reusability. Data and metadata should be made available with a clear and accessible usage license. But, what are the choices? How can researchers share data and allow reusability? Are all the licenses available for sharing content suitable for data? Data can be covered by different layers of copyright protection making the relationship between data and copyright particularly complex. Some research data can be considered as a work and therefore covered by full copyright while other data can be in the public domain due to their lack of originality. Moreover, a collection of data can be protected by special rights in Europe to acknowledge the investment in time and money in obtaining, presenting, arranging or verifying the data. The need of using a license when sharing data comes from the fact that, under current copyright laws, when rights exist, the absence of any legal notice must be understood as the default “all rights reserved” regime. Unless an exception applies, the authorisation of right holders is necessary for reuse. Right holders could use any text to state the reusability of data but it is advisable to use some of the existing licenses, and especially the ones that are suitable for data and databases. We hope that with this paper we can bring some clarity in relation to the rights involved when sharing research data.


THE PROTECTION OF DATA, DATA SETS AND DATABASES
European Union (EU) law defines "databases", but not data sets or, at least for copyright purposes, data. Databases that meet the legal definition  can be protected by copyright if they are original. Data sets, if they correspond to the definition of database, are protected by copyright otherwise not. Data as such are normally excluded from copyright protection [2,3]. It is important to understand that copyright protects original expressions in the "literary and artistic" domain  , an expression that has historically included works such as books, musical works, choreographies, cinematographic works, drawings, etc [4]. Ideas, procedures, methods of operation or mathematical concepts as such, news of the day and miscellaneous facts are excluded from copyright protection [4,5,6].
Two main elements are important in the analysis above. Copyright only protects original expressions and these expressions are normally found in the literary and artistic domain. Literary and artistic domain, however, should not be interpreted narrowly, and in recent years new creations have been included, such as computer programs and compilations of data or databases [6]. The latter is particularly important here. International conventions are clear that copyright can protect compilations of data or other material which by reason of the selection or arrangement of their contents constitute intellectual creations. But copyright does not extend to the data contained in the database which may or may not be protected depending on whether they meet the conditions identified above: to be original expressions in their own right.
At this point it is important to understand that what the law calls data -but does not define -may in fact be quite different from what other disciplines understand by the same term. Databases are defined as collections of independent works, data or other materials arranged in a systematic or methodical way [1]. This definition means that a protected database can be a systematic or methodical collection of works (e.g. a database of journal articles, movies or songs  ), other materials (e.g. sound recordings or broadcasts  ) and data (which are not defined but certainly include elements such as factual information, measurements or other non-original information  ). This situation is fairly consistent at the international level.
It is important to note that in the EU, since the Database Directive of 1996, when data, including noncopyrightable data, are gathered in a non-original database the maker can claim some rights preventing the extraction and reuse of substantial parts of otherwise unprotected data. This is the so-called Sui Generis Database Right (SGDR), not really copyright but similar in some aspects [1].
 "A collection of independent works, data or other materials arranged in a systematic or methodical way and individually accessible by electronic or other means", see Art. 1(2) of the EU Database Directive [1].  "Literary and artistic" are the words employed by the Berne Convention the oldest and most relevant international agreement in the field of copyright [4].  In this case the "data", i.e., the scientific article or song are individually protected by copyright because they are works of authorship for copyright purposes, not data.  These are normally protected by rights related to copyright or neighboring rights.  Non original information is not protected by copyright.

Licensing FAIR Data for Reuse
Accordingly, data understood as factual information, for instance, historical facts or weather measurements are not protected by copyright or the SGDR as such. When these data are collected and organised in a systematic or methodical way they will form part of a database. If the selection or arrangement of the data is original in the sense of the author's own intellectual creation, then such selection and arrangement (i.e., the database structure) but not the collected data are protected by copyright [1]. Does this mean that factual data, even when collected following an original selection and arrangement are not protected by copyright? Yes. Unless the single datum itself is protected by copyright because for copyright purposes it is not a datum but a work (remember the example of the database of journal articles). Does that mean that there is no form of protection similar to copyright for factual data? No, there is. It is the aforementioned SGDR which exists when a substantial investment in obtaining, verifying or presenting the data has taken place. When this happens, the maker of this investment has a right to prevent the extraction (i.e., copy) and reuse of a substantial part of the data, so not of the single datum but of a larger amount that will have to be identified on a case by case basis.
The Database Directive allows member states to implement some listed limitations to the rights granted to the maker [6], but this has not led to a proper harmonisation at the national level [7].
The last aspect to be considered here is that not all types of investment qualify for SGDR. Only investments (time, money, etc.) in the obtaining, verification or presentation but not in the creation of data. How counter-intuitive this may sound, data that are created are not protected by SGDR, only data that are obtained. After all, this is the Database Directive and not the data directive. The initial goal was to incentivise the production of databases, not of data [8]. If there were a right limiting the reuse of data this would halt, instead of incentivising, the production of databases.
Even more important, however, is to understand why factual information is excluded from copyright protection. If facts were protected by copyright it would mean that no one without the authorisation of the right holder could reuse that data. It would mean that no one other than the first person or company recording it could use the same measurements of a natural phenomenon such as the temperature of the oceans. No one could reuse factual data such as the performance of the economy or the geospatial coordinates needed to identify a specific point on the Earth. Such a scenario should be seen with suspicion first and foremost by the scientific community as it would undermine scientific freedoms, transparency and replicability. But it would equally threaten other fundamental rights enshrined in the European Convention on Human Rights and in the EU Charter, such as freedom of information, private property and freedom to conduct a business [2].This is why factual information is not protected as such. By affording protection to the obtaining of data a limited reward for the investment is given to the maker. By excluding protection for the creation of data the law's goal is to avoid as much as possible so-called "single source databases" due to their anti-competitive and monopolistic nature and to the distortion of scientific freedoms and fundamental rights that they may cause.
Therefore, the legal system has designed a mechanism whereby the basic bricks of knowledge such as ideas or factual information are freely available to all in order to learn and advance the knowledge in a field. But this is balanced by the possibility to protect some of the results obtained using those ideas and Licensing FAIR Data for Reuse facts: original expressions of unprotected ideas and substantial amounts of obtained data when systematically or methodically collected and arranged.
In the next section we will discuss the existing options to license data and databases focusing on reusability.
However, there is another aspect of the FAIR principles that must be taken into account before deciding which license to use. This aspect is the accessibility. Not all the data can be shared openly, but metadata can be accessible when data must be kept private for security, privacy or other justified reasons [9]. Metadata are very often a type of factual information and therefore do not qualify as a work covered by copyright. Their compilation, nonetheless, can be protected by existing database rights, as it will be seen in one of the cases in Section 3.

SUITABLE OPTIONS FOR LICENSING DATA AND DATABASE RIGHTS
When looking for existing contractual solutions for sharing rights one can think of the most known set of licenses for open content: The Creative Commons (CC) licenses [10]. Initially, these licenses were not drafted to specifically cover data, which normally are not protected by copyright. This is why, for example, earlier versions of the licenses did not cover the SGDR. The current version 4.0 the SGDR includes in its scope the SGDR (which means that it will follow the licence conditions, e.g. BY, SA, etc.). Creative Commons has also drafted a specific legal tool called CC0  aimed to enable scientists, educators, artists and other creators and owners of copyright or database-protected content to waive those interests, including SGDR. Besides the tools provided by CC there are other legal instruments created especially for data like the ones developed by the Open Data Commons  , or by some national governments.

The options provided by Creative Commons
When Creative Commons launched the first suite of licenses almost two decades ago, the licences were drafted with the US copyright act as reference model and national version of the licenses were developed to better address local legal issues. Quickly, with the rising international interest, it was seen that the licenses would need to change in order to cover different features of other copyright frameworks in a more systematic way. Among other issues, Creative Commons had to face the inclusion of the SGDR into scope of the license. This topic was not approached uniformly in the porting process by all EU affiliated institutions at the time. Some of them did not mention it, others just waived the SGDR and others included it in the license grant.
In 2013, when Creative Commons launched the last version of its license suite,  it was decided to end with the porting of the licenses. Instead, the decision was to have a single legal text that could fit all the  https://creativecommons.org/share-your-work/public-domain/cc0/.  https://opendatacommons.org/about/index.html.  https://creativecommons.org/2013/11/25/ccs-next-generation-licenses-welcome-version-4-0/.

Licensing FAIR Data for Reuse
legal specificities and could be suitable for any jurisdiction. Those updated legal texts could be translated to any language but there was no need to port them.
The current legal texts of the Creative Commons licenses include the SGDR in the scope of the license, meaning that the SGDR is treated exactly as any other licensed right.  This means that the conditions of the license (e.g. Attribution-BY, Share Alike-SA, Non Derivative Works-ND, Non Commercial-NC) will apply also to the SGDR. Therefore, when the maker of an SGDR protected database uses a CC license, they grant to the public the right to extract and reuse the whole or a substantial part of its contents. Those rights can be restricted to non-commercial uses if the database maker chooses a Non Commercial license as CC BY-NC  , CC BY-NC-SA  11 , or CC BY-NC-ND  12 . Moreover, the creation of a new database using the whole or a substantial part of the contents of the licensed database can be restricted by using a Non Derivative license as CC BY-ND  13 or CC BY-NC-ND, or it can be conditioned by a copyleft practice by means of CC BY-SA  14 or CC BY-NC-SA that carry the Share Alike clause. This latter conditions requires that any derivative database must be licensed under the same license or a compatible one. A derivative database must be understood as a new database that includes the whole or a substantial part of the contents of the original licensed database.
CC0 is legal tool that was introduced by Creative Commons in 2008 as a demand from researchers to have a license to waive any database right and especially the SGDR [11]. Its creation followed the Protocol for Implementing Open Access Data launched by Science Commons at the end of 2007. Science Commons was a Creative Commons project created in 2005 and led by John Wilbanks to develop strategies and tools for opening research.
Sometimes CC0 has been seen with some concerns among European scholars because it is understood only as a waiver of all copyright and related rights. This full waiving of rights is not easy to fit in many of the continental copyright laws. However CC0 is more than a waiver, it has three different layers of action. First, the right holder waives any copyright and related rights that can be waived in accordance with the applicable law. Secondly, if there are rights that the right holder cannot waive under applicable law, they are licensed in a way that mirrors as closely as possible the legal effect of a waiver. And finally, if there are any rights that the right holders cannot waive or license, they affirm that they will not exercise them and they will not assert any claim with respect to the use of the work, once again within the limits of applicable law. Therefore, in the case of moral rights, in countries where they do not exist (mainly the US with the exception of the categories covered by the Visual Arts Rights Art [12]), CC0 operates smoothly. In countries where moral rights exist but where they can be waived or not asserted, they are waived if asserted (e.g. the  UK). In countries where they cannot be waived they will remain into full effect in accordance to the applicable law (think of France, Spain or Italy where moral rights cannot be waived).

Other legal tools
In The PDDL is a tool aimed at placing databases in the public domain by waiving all rights. As CC0, it has several layers of action: a dedication to the public domain, a waiver of any copyright rights over the work, a licence of any non-waivable right, and an assert of not claiming any remaining rights for the use of the work to the extent allowed by applicable law. ODC-By and ODC-ODbL are licenses only designed for databases meaning that only copyright in the database (the structure of the database) and the SGDR are licensed, whereas the content of the database (in the example of a database of journals, the copyright in the single journal articles) is not. A specific second licence for the database content should be applied if needed.
By means of ODC-By or ODC-ODbL the rights holder of a database grants the extraction and reuse of the whole or a substantial part of the contents of the licensed database along with the authorisation of the creation of derivative and collective databases and the exploitation of the original licensed database. Those rights are granted subject to the proper attribution of the rights holder and the original source of the licensed database and, in the case of an ODC ODbL licensed database, the use of the same license or a compatible license in any derivative database created. This later condition is equivalent of the Share Alike requirement in Creative Commons licenses and they are both an application of the copyleft practice.
There are also some legal instruments created by national governments under their open data projects aimed at sharing governmental data without any restriction. Among them we can mention the French Open License [13] and the British Open Government Licence [14]. Both licenses grant licensees all rights to exploit works subject to an acknowledgement of the authorship of the content licensed. They both mention compatibility with licenses such as CC BY and ODC-By. It should be noted that these licences are intended for use only by the public bodies that have developed them, so whereas researchers should feel absolutely free to reuse content under, and in compliance with, those licences, they should not choose those licences to license their own contributions.

SOME CASES OF LICENSING DATA RIGHTS
In 2011, Europeana  16 , the European digital platform for cultural heritage, decided to adopt a new data exchange agreement requiring its data providers to release all their metadata under CC0. Metadata sent to Europeana was seen as not copyrightable individually but the complete set of metadata could fall into the scope of the SGDR. In order to avoid managing all the required permissions to extract the metadata for its inclusion in the portal or in any project, Europeana adopted the CC0 solution. This approach has been used by other cultural and academic institutions like libraries. National libraries have shared catalogue records, bibliographic files or even metadata from repositories under CC0. Europeana has employed an interesting approach to request attribution. They ask you kindly. Or better as an accepted community norm in the scientific field. There is no legal requirement to do that, just an ethical or community one [15].
OpenStreetMap (OSM)  17 is a collaborative project to create a free editable map of the world that started as a consequence of the restrictions on using some of the public maps available. All the OSM data is licensed under the ODbL since 2012 when it changed from its initial CC BY-SA license. The change was made because at the time CC licenses did not include SGDR (but current version 4.0 does). The use of this license allows a broad reuse of the data and there are many adopters of OSM data, with Apple being probably the most known among them.

CONCLUSION
Researchers have a broad set of options when sharing data. However, it should be clarified that in many instances there will be no copyright or related rights on data. In these cases, a CC0 or a PPDL, eventually with a request to give credit, could be the best option between unrestricted access and the promotion of a fair community practice that acknowledges the provenance of data because it is ethically -not legally -required. If particular needs are present, CC BY 4.0 or ODC-By and ODC-ODbL will also work well, but researchers opting for these solutions should carefully assess why they do that. In the generality of cases, Open Science is easier to achieve if less restrictions are present for the reuse of data. This does not mean that you should not ask for attribution for your data. It means to carefully weigh the pros and cons of requiring attribution. This will allow you to make the best choice in most cases. Finally, clauses such as  16 https://www.europeana.eu.  17 https://www.openstreetmap.org/.  18 http://opensourcemalaria.org/.  19 https://github.com/OpenSourceMycetoma.

Licensing FAIR Data for Reuse
non-commercial or non-derivatives should be avoided as they are not Open Access compliant and severely restrict the reuse of knowledge.
These may be difficult choices for researchers and a number of resources have been made available for guidance and support [16].