Skip to content

01 Digital Research Data

(Digital) Research Data

In 2015, the German Research Foundation (DFG) adopted the "Guidelines for Handling Research Data" in which research data were described as follows: "Research data might include measurement data, laboratory values, audiovisual information, texts, survey data, objects from collections, or samples that were created, developed or evaluated during scientific work. Methodical forms of testing such as questionnaires, software and simulations may also produce important results for scientific research and should therefore also be categorized as research data.". We only consider digital research data in this documentation. Depending on the subject area and the relevant context, research data can be very diverse (conditions of generation, methods used, perspective). Since they can be very heterogeneous, we refer to it as just "research data".

Research Data Lifecycle

The research data life cycle represents the steps necessary to map the process of a research project in relation to the research data. The life cycle according to forschungsdaten.info include planning, collecting, processing and analyzing, sharing and publishing, archiving, and post-use of data.

CC-BY-SA 4.0 Gaelen Pinnock

CC-BY-SA 4.0: Gaelen Pinnock


Research Data Workflow

Based on the research data lifecycle, a research data workflow describes the individual processing steps of the research data depending on software, required infrastructures and services. A process-oriented perspective allows data creators to map data transfers and conversions, which are necessary between data processing and analysis. Roles and actors are also defined in a workflow.

Research Data Management (RDM)

What is RDM?

Research data are highly important resources in science. Accordingly, systematic and responsible handling of them is fundamentally important. In the task of research data management (RDM), an institution organizes and controls efficiently and goal-oriented its own working processes for generating and handling research data. RDM accompanies research from the initial planning stages to the archiving, subsequent use or deletion of the data.

As part of research data management, researchers develop methods and guidelines that they apply to their research activities involving research data. The resulting strategy helps to manage the data in the subsequent research process, and guides and unifies the way it is handled. Strategy, planned methods and guidelines form the data management plan. It includes technical, organizational, structural, legal, and ethical aspects of handling data for the duration of a project. But also more far-reaching aspects, such as the sustainability of the data, can be taken into account right from the start.

CC-By https://aukeherrema.nl

CC-BY 4.0: https://aukeherrema.nl

Why do we manage Research Data?

A good research data management strategy simplifies working with the data during the project and afterward. It serves as a compass for all participants to control the research processes and manage their results. In the planning phase of the research, it does take time to develop the guidelines and methods for this. This effort pays off later on several levels. Retrieving the data and retracing the processing is much easier if the analyses and results can be reproduced. The chances of reusing the data increase. In particular, research becomes more comprehensible and reproducible, and the validation of results in terms of good scientific practice becomes easier. For researchers, this can contribute to additional scientific recognition and reputation.

Increasing practical relevance of research data management for researchers arises from research funders and publishers. They demand systematic and planned handling of the generated data during the lifetime of the project as well as access to research data after the project has been completed, i.e., proactive research data management.

Advantages of doing research data management

  • Faster retrieval of data, e.g. through meaningful naming
  • Clarity, e.g. no scattered storage of data in different versions on different computers
  • Knowledge preservation - data is accessible independently of individual people, projects, or institutions
  • Transfer of data to future projects
  • Facilitation of collaboration
  • Long-term traceability of results, instead of new creation (preservation of primary and secondary data)
  • Prevents loss of data, e.g. due to defective hardware or software or original versions of files
  • (Semi-)automatic processing is enabled by metadata
  • Sharing and reuse of data through the use of appropriately formulated consent forms, e.g., no requirement that data be deleted at the end of the project
  • Optimized use of resources, e.g. cost savings through re-use instead of new data collection
  • Fulfillment of conditions imposed by third-party funding sources
  • Research data citation
  • Referenciability
  • Increasing the relevance of one's own work through better visibility

RDM Tasks

Research data management is involved in all steps of the research process. The central tasks of research data management are:

  • Planning the handling of research data at the beginning of a research project and, if necessary, presenting the planned measures in funding applications (Chapter 03)
  • Determination of a folder structure and file naming conventions (Chapter 04)
  • Documentation of research data and labeling with metadata (Chapter 05)
  • Backup and long-term archiving of research data (Chapter 06)
  • IT security and access rights for research data (Chapter 08)
  • Long-term archiving of research data (Chapter 07)
  • Publication of research data (Chapter 09)
  • Finding and reusing existing research data (Chapter 10)
  • Consideration of data protection and copyright law when dealing with research data (Chapter 11)
  • Handling of research software (RSE Guidelines)

FAIR Principles and Open Data

In 2016, FORCE11, a group of researchers, librarians, archivists, publishers, and research funders, designed principles for the preparation of research data. These FAIR principles include four goals: Findable, Accessible, Interoperable, and Reusable data.

By applying or implementing the FAIR principles, data and metadata become human- and machine-readable, making them more effectively discoverable and reusable. In addition, data and their metadata should be archived in such a way that they can be easily retrieved, downloaded, or used locally by humans and machines over the long term using standard communication protocols. Data should be in a form where it can be shared, interpreted, and combine with other data sets in a (partially) automated manner. In this context, a good description of the data and their metadata - ideally in a standardized form - ensures the reuse of the data for future research and comparison with other compatible data sources.

Proper citation of the data as well as an unambiguous representation of the subsequent use conditions for both humans and machines must be made possible.

CC-BY-SA: Sangya Pundir

[CC-BY-SA 4.0](https://creativecommons.org/licenses/by-sa/4.0/): Sangya Pundir

Providing open data means making research data publicly available, accessible and reusable with minimal restrictions. This includes publication:

  • with an open license
  • machine-readable
  • in a non-proprietary format
  • using open standards
  • linked to other data.

The main goal of the FAIR principles is to make research data reusable. This does not mean that all data must be open and accessible without restriction. Restricted accessibility, for example due to data protection, does not contradict the FAIR principles as long as the metadata is available and accessible. Careful research data management right from the start is one of the foundations of both FAIR and Open Data, because many of the course settings for implementing these two principles as widely as possible are made very early in the research process. Various tools offer online questionnaires for self-assessment of FAIRness. The higher the FAIRness and the more open the data, the more likely it is to be re-used and the higher the reputation.

In addition, research data management also supports aspects that are not necessarily covered by FAIRness and openness - such as long-term archiving and good data quality.

CARE principles

To complement the FAIR principles, the Global Indigenous Data Alliance (GIDA) developed the CARE principles: collective benefit, authority to control, responsibility, ethics. The FAIR principles focus primarily on characteristics of data that enable reuse, while ignoring power differentials and historical contexts. The CARE Principles are an extension to ensure that the rights and interests of indigenous peoples are more adequately addressed. They explicitly include that indigenous data may be lawfully used if the use is based on indigenous worldview and promotes indigenous innovation and self-determination.

Although developed and formulated in relation to Indigenous peoples, the CARE Principles refer to how scientific data are used in a way that is that are purposeful and focused on improving the well-being of all people. FAIR and CARE are complementary perspectives, focused on the appropriate and ethical reuse of data. The FAIR assessment of a dataset is typically a self-directed technical review by the researcher. The CARE principles require the involvement of people to address the cultural, ethical, legal and social dimensions associated with the data.

Recommendations for a Jump Start

Jump start

Don't panic, keep reading these guidelines 😃