Skip to content

09 Publishing Research Data

Benefits of and Barriers to Data Publication

To allow data to be re-used for research beyond the initial research question, it should be published. It is essential that these data are findable, accessible, interoperable and reusable. There are certain objections to the publication of data, but there are also good reasons to do so:

  • Researchers increasingly have to deal with competition when applying for public funding and publishing their results. Data can be regarded as a public investment. Their publication enables researchers to gain further recognition, namely for their data as a scientific asset that stands for itself and that may be brought to the table in future applications for funding.

  • The publication of data contributes to academic integrity. It makes research replicable and transparent. Replication of the results by third parties verifies one's own work and has a positive effect on the reputation. There are indications that articles whose data have been published are cited more often.

  • Some researchers are concerned about their data being misinterpreted, edited or misused. But these concerns are void if the data constitute a significant part of the study in which they are reused and if this results in new citation, collaboration or co-authoring for the data originators.

  • Sharing data within one's own discipline helps to advance the current state of knowledge. Researchers want to maintain their interest in publishing the findings from their data comprehensively and as the primary source. They fear other researchers might publish results based on their research data, which could overlap with their own planned publications and thus pre-empt and complicate their publication. However, it is the authors of the data who decide on an embargo period, i.e., they decide whether and when they publish their data and from what time on the data can be used by others.

  • By reusing the data, duplicate surveys and thus unnecessary costs are avoided, which enables a more efficient allocation of resources. In addition, published data are excellent resources for training and teaching.

  • The preparation of data for publication as well as the processing of requests for data can be very time-consuming for the data producers. Dealing with the publication of data at an early stage in the research process will lead to a better and more consistent documentation and quality of the data, which in turn may facilitate the publication of the research results and the long-term archiving of the data.

  • Last but not least, the publication of data is increasingly required by publishers, institutions and funding agencies.

CC-By https://aukeherrema.nl

CC-By: https://aukeherrema.nl

Key Questions for the Selection of Data

For each publication, it must be decided under which terms it will be published, for example whether it will be made freely accessible (open access) or archived in an access-controlled manner. Competitive pressure within the scientific community can make a limited or time-delayed publication worthwhile: if the data collected is to be used for further publications, the time of publication of the data and the choice of publication model are crucial aspects.

Key questions to be answered anew before each publication of data3

  • Is it a completed data collection or a cumulative data set still growing?
  • At what point in the research process are the data to be published?
  • What is the motivation for publishing the data?
  • Are raw data or edited data going to be published?
  • Should the data be subject to a peer review process?
  • Is it sufficient to publish a single data set to meet the various requirements of: own publication, long-term archiving, requirements of funding agencies, own institution, ...?

It is not always clear in advance which data will be particularly valuable for subsequent use. Later studies may examine data sets or evaluate metadata1 under completely unanticipated aspects2. It is therefore recommended that research data be published and well documented, even if their value or tangible benefit is not clearly evident at this stage.

Methods for Publishing Data

Research data can be published in different ways. The choice of publication channel depends on the type of research and the content of the data. The most common is the option (I) of publishing aggregated data as a supplement to the academic article via the publisher. More recently, the options (II) of publishing the data in a repository as independent information objects and (III) of publishing the data description in so-called data journals have been extended. These are journals that specialize in reporting on published/accessible data.

  • (I) Data may be published by publishers as supplements to publications of research results in scientific articles. These data support and clarify the research results presented in the article. In most cases, they are aggregated data, such as smaller tables or images.
  • (II) Data may be published as an independent information object in a repository. As described below, there are different types of repositories. In discipline-specific repositories, it is easier for the community to find the data. In addition, they can be better contextualized or linked to other data sets. Discipline-specific repositories also offer corresponding features such as search, analysis and visualization. In cross-disciplinary and especially in institutional repositories, the data are less easily found.
  • (III) Data journals are dedicated to the publication of information about data published in open or restricted repositories. The information is a detailed documentation of published data, their properties and details of potential subsequent use. The data in the repository and its documentation in the journal are linked to each other by means of a persistent identifier (see below) and can thus be clearly located. Some of these journals offer a peer-review procedure in which the data set and its documentation are reviewed. For example, it is checked whether the data and its documentation match, whether the documentation sufficiently explains the data, what value the data has and whether the file formats are standardized. Examples of such data journals are the Open Access journal Earth System Science Data in the geosciences or the interdisciplinary Data in Brief.

Repositories

Publish your data

CC-By: Roche DG, Lanfear R, Binning SA, Haff TM, Schwanz LE, et al.

Repositories are used to archive, document and publish digital objects. They are storage locations for data, which enable the publication and archiving of data independent of the scholarly article itself in terms of time and space.

Depending on the repository, data, data sets, descriptions of experiments and evaluations, audiovisual objects such as image and video files, models of simulations and also software can be published. In some repositories, entire research data sets can be stored in their complex form as a single unit (e.g. "collection" in PANGAEA).

Types of Repositories

Repositories can be classified according to various aspects. Most often, they are distinguished by whether they are discipline-specific, cross-discipline/generic or institutional. Discipline-specific or disciplinary repositories offer the advantage of visibility in the research community and are already established institutions in some disciplines. However, not all academic subject areas have yet established discipline-specific repositories or many are still in the development or project stage. These repositories usually offer discipline-specific knowledge and know-how in the field of research data management, e.g. with regard to the curation of data or to special services (e.g. visualization tools), but also specific tools for the analysis, research and visualization of data.

Examples for Discipline-Specific Repositories

For interdisciplinary research, the assignment of the resulting data to a subject area may be difficult. Cross-disciplinary repositories offer a solution here. They generally accept very different types of data and provide a good search function. In most cases, however, they do not curate the data or offer other forms of quality control.

Examples for Cross-Disciplinary, Generic Repositories

  • ZENODO: Digital data from all research areas, EU OpenAIRE project
  • DRYAD: Focus on life sciences, not free of charge
  • OSF: Data infrastructure of the Center for Open Science (US)
  • B2Share: Collaborative data infrastructure by the EU
  • Figshare: Digital data from all research areas, commercial data service.

Institutional repositories are also emerging more and more. Currently, they offer an alternative if no suitable discipline-specific repository is available. Researchers are happy to take up this offer. For example, if the legal framework for handling data at an external repository location is ambiguous for researchers, they may prefer to publish in their own institution's repository. Institutional repositories are generally available and can be used free of charge for all the institution's own subject areas.

Examples of Institutional Repositories in Helmholtz

Selection of a Repository

To find an appropriate repository, the cross-disciplinary directory re3data can be used (e. g. the re3data entry of UFZ DRP). This is a DFG-funded project that lists German and international repositories for research data. Here you may select the discipline, type of data or country. It is also possible to filter by detailed criteria, for example for repositories that charge a fee for data upload or where data use is restricted.

Recommendations for selecting a repository for data publication4

  1. choose an external discipline-specific repository that is recognized in the discipline
  2. find a suitable repository via re3data.org
  3. select an institutional repository, or
  4. use a cost-free multidisciplinary repository

Specific criteria for selecting a suitable repository5

  1. certification, e.g. Core Trust Seal
  2. (automated) assignment of persistent identifiers, e.g. DOI, handle
  3. access to data: open, restricted or inaccessible
  4. clear terms of use for data authors and users, e.g. fees, embargo periods

The options for choosing a license may also influence the choice of a suitable repository.

Licenses

As part of the publication process, it is decided under which license the data is released. This decision regulates their use by third parties. Widely used are the free licenses of Creative Commons (CC).

Examples for Creative Commons Licenses

  • CC0 (Public Domain)
  • CC BY (Attribution)
  • CC BY-ND (Attribution – No derivative works)
  • CC BY-NC (Attribution – Non-commercial)
  • CC BY-SA (Attribution – Share-alike: Distribution under Equal Terms)
  • CC BY-NC-SA (Attribution – Non-commercial - Share-alike: Distribution under Equal Terms)
  • CC BY-NC-ND (Attribution – Non-commercial - No derivative works)

The granting of a Creative Commons license has no influence on the copyright. However, it gives you the opportunity to grant rights of use for research data in a simple and standardized way. When granting CC0, the author(s) renounce all copyright and related rights, whereas all other CC licenses may grant or restrict the rights of use to different extents.

Software as a research date requires a separate license to meet the special requirements of this format (agreement to install, modify, execute, purpose or location of use, number of users, etc.). The Creative Commons licenses do not cover this. It is recommended to use one of the common software licenses, such as the MIT license, GNU General Public License (GPL), GNU Lesser General Public License (LGPL) or the Apache license. For further information about the handling of research software, pls. refer to the RSE knowledge base (work in progress!)

It is recommended to restrict the use of the research data as little as possible. This facilitates the subsequent use by third parties. If no license is granted, subsequent use is not permitted without the consent of the copyright holder. Further advice on how to license your research data.

Persistent Identifiers

Persistent identifiers (PIDs) are used to make digital publications findable in the long term and to solve the problem of "dead" links, as well as to improve the documentation of research data, especially their machine readability. Every object is identified by a unique name. This name is then included wherever reference is made to this object, i.e. the resource is linked to the identifier, not to a specific location. The persistence of PIDs is not guaranteed by technical means, but by contractual regulations.

There are different types of persistent identifiers, because potentially everything that is distinguishable and nameable can be provided with a persistent identifier. Two persistent identifiers are presented below: Digital Object Identifier (DOI) for data identification and Open Researcher and Contributor ID (ORCID) for unique identification of researchers. Other persistent identifiers used for scholarly work are, for example, Uniform Resource Name (URN), which, however, is not used worldwide but only regionally, the "International Geo Sample Number" for geological samples, which is assigned via the System for Earth Sample Registration SESAR, or the identifiers of the Research Organization Registry (ROR) for research institutions (e. g. the ROR of UFZ). The allocation of persistent identifiers may involve costs that should be priced into data management.

Digital Object Identifier (DOI)

The Digital Object Identifier (DOI) is very common because it is citable, and its allocation has been free of charge in Germany since 2013. The International DOI Foundation (IDF) ensures uniform standards and workflows for the use of DOIs and there has been an ISO standard for this purpose since May 2012. DOIs are unique sequences of alphanumeric characters. Permitted characters are: a-z, A-Z, 0-9, . (dot), - (hyphen), _ (underscore), : (colon) and / (slash). Each DOI consists of two parts, a prefix that identifies the issuing organization and a suffix that identifies the object.

Examples for the suffix structure

  • Original DOI: 10.1234/abc123
  • DOI of a new version: 10.1234/abc123.1
  • DOI of a part: 10.1234/abc123/2

The DOI allows a distinct linking between the data and the resulting publications. Thus, the data remains permanently citable. DOIs are often assigned by repositories or institutions such as libraries. DataCite distributes so-called number ranges to these institutions (registrar), which then assign them individually (registrant).

In contrast to DOIs, handles as a global reference system for large amounts of data (and the basic technology for DOIs) are less persistent and not secured by a standard. They are therefore particularly suitable for referencing data before publishing.

Open Researcher and Contributor ID (ORCID)

To ensure all scientific papers are clearly assigned to the author, the so-called ORCID is used. Since researchers usually work at different institutions in the course of their academic career, their contact details may change. Sometimes the names of researchers also change in the course of their professional life. In order to be able to easily allocate all publications over time and with the change of institutions and names, researchers have the option of registering with ORCID. Furthermore, by registering with ORCID, they can avoid having to enter the same personal data over and over again, for example when submitting data or articles for publication. Above all, this prevents confusion when identical names occur within the same discipline.

What you should know about ORCID?6

  1. stands for Open Researcher and Contributor ID
  2. 16-digit (alpha) numeric code
  3. protects your unique scholarly identity (also across name changes, typing errors or identical names)
  4. is used by journals, funding agencies and institutions
  5. is maintained by researchers themselves
  6. lasts longer than an e-mail address
  7. ORCID creation takes about 30 seconds
  8. is run by a non-profit initiative
  9. continuous growth (February 22 2022: 13.451.994 ORCIDs)
  10. links to Web of Science, Scopus, Zenodo, DataCite, etc.

Recommendations for a jump start

Jump start

Publish your data when it's ready
Publish your data in a repository (institutional or discipline specific)
Register a DOI for your data and use it in your publications
Get yourself an ORCID


  1. McKiernan, Erin C., Philip E. Bourne, C. Titus Brown, Stuart Buck, Amye Kenall, Jenifer Lin, Damon McDougall et al.: Point of view: How open science helps researchers succeed. eLife 5 (2016), p. e16800. https://doi.org/10.7554/eLife.16800

  2. Steiner, Daniel, Heinz J. Zumbühl and Andreas Bauder: „Two Alpine Glaciers over the Past Two Centuries.” In Darkening Peaks: Glacier Retreat, Science, and Society. Ed. by Ben Orlowe, pp. 83–99. Berkeley, CA: University of California Press, 2008. 

  3. Based on Martin, Elaine R. (ed.). New England Collaborative Data management Curriculum. Module 6. Data Sharing & Reuse Politics. Last accessed 23.02.2022. https://library.umassmed.edu/docs/necdmc_module6.docx

  4. openaire.eu/opendatapilot-repository-guide. Last accessed 17.05.2022. 

  5. re3data.org/faq. Last accessed 23.02.2022. 

  6. orcid.org/blog/2014/04/29/tenthingsyouneedtoknow. Last accessed 23.02.2022.