04 Structure and Sort¶

Motivation¶

A structured approach has many advantages:

even after years it remains comprehensible what, how and why something was done
the naming conventions are known to you and other researchers, which simplifies collaboration
other researchers can also work with the data
to be able to search for data more easily and find them faster
to avoid double work
prevent data loss due to overwriting or accidental deletion
to identify the current status of your project
to ensure machine readability

In the end, it leads to more efficient work.

[CC-BY](https://creativecommons.org/licenses/by/4.0/): Data Stewards, Ghent University

Folder Structure¶

A directory tree is the hierarchical arrangement in which folders are created. Hierarchical structures make it easier to find data. The directory structure should be clearly visible and thus understandable to other researchers. The more carefully you plan it, the easier it will be to find your way around later. Ideally, directory structures follow the workflow in the respective project and thus support the usually step-by-step creation, analysis and publication of data. For a good overview, directory structures on servers should be identical to those on local computers.

Example of a directory structure with subfolders

.
└─ project/
   └─ data/
      └─ raw_data/
      └─ aggregated_data/
   └─ code/
   └─ output/
   └─ paper/
      └─ poster/
      └─ slides/
      └─ submission/

For further advice about folder structure, Generation R provides more insight.

Naming¶

The file name should be objective and intuitive, as well as comprehensible regardless of the person. Naming and labeling can be done according to the following three criteria:

The system - important for subsequent access and retrieval of the data is consideration of the system under which the file is stored.
The context - the file name includes content-specific or descriptive information so that regardless of where it is stored, it remains clear to which context the file belongs, e.g., "Schedule.pdf" or "ScheduleFDMentor.pdf".
Consistency - the naming convention should be chosen in advance to ensure that it can be followed systematically and contains the same information (such as date and time) in the same order (e.g. YYYYMMDD).

File names should be as long as necessary and as short as possible to remain concise and readable under any operating system. Name components that are already stored in the folder names do not have to be repeated in the file names.

Naming components

Content
Creator
Creation date
Editing date
Name of the working group
Publication date
Project number
Version number

Spaces, dots and special characters (like { } [ ] < > ( ) * % # ' ; " , : ? ! & @ $ ~) should be avoided, because they are interpreted differently under different systems and this can lead to errors. With most operating systems, you can replace spaces with underscores or capitalize the first letter of words. To allow chronological sorting, it is recommended to start the name with a date, for example YYYYMMDD_Name or YYYYMMDDName etc.

Naming examples

20160512_Climate_measurement_1_original.jpg
20160522_Climate_measurement_1_MHU_cutout.jpg
20160523_Climate_measurement_1_MHU_cutout_edited_color.jpg

Automatically generated names (e.g., from the digital camera) should be avoided as they may cause conflicts due to repetition. When deciding on the naming convention, scalability should be considered: for example, choosing a two-digit file number limits the number of files to 99.

For larger and smaller projects likewise, it is worthwhile to record the chosen naming conventions. In particular, abbreviations chosen should be explained in a data management plan or README file so that these conventions can be reproduced in the future.

Bulk Renaming¶

Renaming multiple files at once is useful in many situations, e.g.:

to change automatically generated names from the digital camera or other software in one step;
to remove or replace spaces or other special characters from multiple file names in one step. or to replace them.

Software for renaming multiple files:

Windows

Ant Renamer
Rename-IT
Bulk Rename Utility

Mac

Renamer 6
Name Changer

Linux

GNOME Commander
GPRename
rename: Ubuntu, Arch
mmv: Ubuntu, Arch

Version Control¶

Versioning can be used for various purposes:

To keep an overview of the steps performed and to make them traceable
to go back one step in the document/data history
making versions available to the public can support troubleshooting
the inclusion of new data and/or change in a file structure - especially in software as a research date - can lead to new versions of the same file or even to different results.

The most common way to designate versions is to assign whole numbers for major version changes and numbers joined with an underscore for minor changes (e.g., v1, v2, v1_01, v2_03, etc.). It is discouraged to use designations such as final, final2, revision, definitive_final.

Examples of file labeling with version control:

[document name][version number]
Doe_interview_July2010_V1
Lipid_analysis_rate_V2
2017_01_28_MR_CS3_V6_03

Version control software (e.g., Git) is very helpful in managing versions. Versioning and change tracking is available for collaborative documents and locations, such as in the wiki, Nextcloud, or other services in the cloud (HedgeDoc, etherpad).

Versioning tools @ UFZ/Helmholtz

HedgeDoc: DESY
GitLab: UFZ, HIFIS (HZDR)
NextCloud: UFZ, Nubes (HZB)
Wiki: UFZ

Recommendations for a Jump Start¶

Jump start

create conventions for folder structure and naming
use your institutions gitlab to host your data (and version it in the same time)