Skip to content

10 Reuse Research Data

Why Reuse Data?

The benefits of data reuse emerge at several levels:

for researchers who reuse data
  • less effort and costs, since no own data collection
  • secondary analyses on new research questions and/or with new methods
  • comparisons over time
  • comparisons of different samples
  • links to other sources
  • new collaborations
for researchers who publish data
  • citations
  • transparency
  • enhancing the scholarly reputation
  • new collaborations
for the discipline
  • reproducibility of the research
  • more efficient research
  • enabling new research
  • preservation and safeguarding of data for the future, especially relevant for non-repeatable data collections (historically unique phenomena) and for data from vulnerable groups that are difficult to access
  • applicability in teaching
for the public benefit
  • transparency of research
  • trust in science
  • protection of population and environment through less frequent primary data gathering
  • economic exploitation also by the private sector

Researchers can spare the effort of their own data collection by using already existing data sets. By reusing them, they expand their own research base. New collaborations may also result from this. However, the reuse of research data involves the effort to read and understand the data.

CC-BY https://aukeherrema.nl

CC-By: https://aukeherrema.nl

Searching for Data

Research data is currently still difficult to find. Many directories, and (meta) search engines are under construction and they vary greatly in volume, quality and reputation. Finding suitable data for reuse usually requires a search in different sources.

Other portals can also be found on the information website forschungsdaten.info.

Access and Terms of use

Access to research data is either open (unrestricted), conditional or restricted (only for specific purposes/research, for specific groups of people, users must provide information or meet other formal requirements) or even completely inaccessible. Sometimes fees are charged for providing data (e.g. for sending a DVD with the data). The terms of use are determined by the repository's own terms of use on the one hand, and by those that are specified by the data authors, such as the licenses used, on the other. If this information is not evident from the metadata, it should be clarified or negotiated in the context of the request for reuse.

Key Questions for the Evaluation of Reusability

Once the legal reusability has been checked, the content should be examined. As a rule, the metadata only provides basic information about whether this data set could be suitable for its intended purpose. If the data set appears to be suitable for one's own work at first glance, the suitability must be examined more closely. You can picture this examination similar to a more in-depth study of a scholarly article, in which the details of the collection, evaluation and interpretation are checked and assessed very carefully.

Key Questions

  • Is the specific research question well-documented?
  • How was the data collected?
  • Are the collection and processing methods used appropriate to the research question, and do they correspond to the current state-of-the-art in my field of expertise?
  • Is the procedure of data collection accurately recorded and comprehensibly documented?
  • Which collection instruments were used for collecting the data? And what settings or parameters?
  • Are reports and protocols of the data gathering as well as their specifics included in the data set?
  • Is the description of the data set available and sufficient to understand the data and its context of origin?
  • Which criteria for data selection were applied?
  • Has the data been processed since the data collection? If so, how, e.g. handling of missing values? Weighting?
  • Are precise descriptions of the variables available, e.g. what variables are there, how are they coded, etc.?
  • Is all the information understandable and consistent?
  • Is the source trustworthy?

Only if these questions can be answered sufficiently is it possible to determine whether the data set is suitable for subsequent use.

Citation

The citation of data sets serves several purposes. First of all, it acknowledges the author's performance in producing the data and creating the data set. Furthermore, it ensures transparency in academic research and corresponds to good scientific practice. At the same time, the citing of data provides the basis for further reuse: other researchers learn which data have been used and where they can be found. Citations allow the author to find out what influence his or her work has and for what purposes the data is reused. Reuse without citation would be plagiarism.

The citation of data sets serves several purposes. In 2014, FORCE11 defined data citation principles that cover the purpose, function and attributes of citations. These principles recognize the dual need to create citation practices that are both machine-readable and comprehensible to humans.

Data Citation Principles

  1. importance
  2. credit and attribution
  3. evidence
  4. unique identification
  5. access
  6. persistence
  7. specificity and verifiability
  8. interoperability and flexibility

Some subject areas already have their own recommendations (e.g. psychology from APA). In general, citing data should be similar to citing a research article. Among the common standard information are:

  • Author
  • Publication date
  • Title
  • Publisher (name of the data center/institution that published the source)
  • Resource type (for example, data set)
  • Persistent identifier
  • if applicable, Version number, in case of several published versions

Examples of Data Citation

  • Markowski, Radoslaw; Gebethner, Stanislaw; Grabowska, Mirosława; Grzelak, Paweł; Jasiewicz, Krysztof et. al. (2006): Polish National Election Study 2000 (PGSW). Version: 1.0.0. GESIS Data Archive. Data set. doi.org/10.4232/1.4334
  • U.S. Department of Health and Human Services, Substance Abuse and Mental Health Services Administration, Office of Applied Studies. (2013). Treatment episode data set -- discharges (TEDS-D) -- concatenated, 2006 to 2009 [Data set]. doi:10.3886/ICPSR30122.v2
  • [Tool] DOI Citation Formatter

For the reuse of works with a Creative Commons license, you should also make sure the license is correctly attributed. This particularly means the source of the work should include the following information:

  • Source
  • Name of the license, including the version and link to the license description
  • if applicable, processing information (since version 4.0)
  • If applicable, title of the work (since version 4.0)

Recommendations for a jump start

Jump start

If applicable reuse existing data
Make your data citable and cite used data
Attribute and consider the license aggreements