Data Management

In each research project, efficient management of research data is an implicit part of good and integer research. Best practices for research data management:

Data Management Plan Guide

A data management plan (DMP) is a dynamic document in which you write down your intentions concerning data management during and after your research project. At Hasselt University, a researcher must write an initial data management plan within six months of the official start of the research project (art. 6, General policy plan on research data management at Hasselt University). Click here for a full overview of the funder requirements for Research Data Management.

What & Why

A data management plan (DMP) is a dynamic (i.e. 'living') document in which you write down your intentions concerning data management during your research. Creating a DMP at the beginning of your project helps you to reflect on how you will collect and process data, where you will store them, which security measures you should provide, which costs you should consider, etc. That way, you are well prepared from the very start, eliminating the risk of data loss, confusion (which version was the last one?), avoiding legal issues and other unhappy outcomes while making your data findable and reusable by others (win-win).

The plan is intended to be used actively as a guide and keep the plan up-to-date regularly as your research project evolves. The final version of your DMP should truly represent how you have handled your data during the project and how you will handle them afterward. Remember that the DMP itself is not a place to store data.

How: Data Management Plan Flow

Applying for a grant (application DMP) 


At the start of the project

(within the first six months; initial full DMP)

1 |  Write your DMP

In DMPonline (more information below) or in MS Word using a funder template: FWO/BOF/VLAIO-cSBO | BELSPO | Horizon Europe | H2020  | ERC

Optional - Make your life easier:


2 |  Request feedback

3 |  Revise your DMP 

If necessary, adjust your DMP based on the feedback.


4 |  Export your DMP

  • Via DMPonline, download your DMP in pdf format.
  • Via MS Word: convert your DMP to PDF format.

5 |  Submit your DMP (initial full DMP)

  • For FWO, EOS, VLAIO-cSBO and BOF-IOF: send final version to rdm@uhasselt.be.
  • For BELSPO: to BELSPO programme administrator.
  • For Horizon Europe and H2020 to the EC portal.

During the project

Update the DMP regularly based on how the data management is implemented in your research project. Contact us if you have any questions throughout your research project. You do not need to send updates of your plan to RDM.


At the end of the project (final full DMP)

For FWO, consult this Cheat sheet: submitting a final DMP https://doi.org/10.5281/zenodo.10985170

Complete your DMP (either the Word version or in DMPonline). Send the PDF version of your final DMP to rdm@uhasselt.be; and submit it to your funder with the final report if necessary.

DMPonline

A user-friendly tool for creating a data management plan (DMP) is DMPonline. Below you will find a basic step-by-step plan for this tool, and an extensive manual can be found via this link. A benefit of using DMPonline is that you will have access to example answers and Hasselt University guidance.

  • Select your institution (e.g. Hasselt University) and log in with your user account.
  • Click on the tab 'Create plans' at the top to generate a new plan. Give your plan a title and select your funder to automatically create a new DMP according to the DMP template of your funder (FWO, BELSPO, Horizon, ERC). For BOF, IOF and VLAIO tick "No funder associated with this plan or my funder is not listed" to generate the standard Flemish DMP template.
  • You fill in the project details. Make sure the box 'Hasselt University' guidance is checked to have access to our recommendations.
  • Select the tab 'Application DMP' or 'Full DMP' and answer all the questions.
  • In the tab 'Share' you can give other collaborators read and/or write rights in order to collaborate on the DMP.
  • In the 'Request feedback' tab you will find a button to request feedback from the RDM team. The RDM stewards will receive your request, review your plan, and notify you about the follow-up. You can also contact the RDM steward of your discipline directly for questions/guidance.
  • The 'Download' tab allows you to export your plan in the format of your choice (e.g. pdf).

You are not required to use DMPonline. You may also use another tool, but make sure the questions from your funder's DMP template are answered.

Data Description

Sound data management – and, hence, a well-thought Data Management Plan (DMP) – starts with identifying a complete and detailed list of all data you will collect, generate, and (re)use.

Research data are all data generated, collected or used in the context of any research project.

As a result, this broad definition includes a wide array of types and formats of data, ranging from raw data to processed and even published data. Examples may include, but are not limited to: notes, surveys, figures, objects, audio-visual files, spreadsheets, databases, statistical data, geographical data, research software, simulations, samples (including biological material, personal data, patient data, etc.).

Data types

Origin of the data

Generate new data  - Primary data
These data are created by the researchers themselves.

Reuse existing data - Secondary data 
These data are generated by third parties and/or within the scope of another research project. When reusing such existing data, one has to keep in mind that certain agreements or restrictions may apply. To find out, one should consult the terms of agreement (e.g. when mining an existing database), the data or material transfer agreement, or other miscellaneous contracts. More information can be found on the ethical and legal webpage.


Stage in the research project

Raw data
These data are the original data that you have collected or generated but not yet processed or analyzed. Raw data is typically in its most basic and unorganized form, representing the original observations, measurements, or responses.
Examples: audio or video files, sensor measurements, archives, observations, field notes, data from experiments, etc.

Processed data
These data are the raw data that you have somehow transformed, for example, digitized, translated, standardized, aggregated, summarized, transcribed, cleaned, formatted, validated, checked and/or anonymized.

Analyzed data
These data are intended to present your conclusions in a scientific publication and represent the final step of data processing.
Examples: models, graphs, diagrams, tables, texts, etc.


Materiality of the data

Digital data

Non-digital, analogue or physical data
Physical data are equally considered research data. Obviously, these data require a completely different approach regarding, for example, storage and preservation.
Examples: paper-based questionnaires and notes, archaeological findings, art works (e.g. paintings, sculptures, photographs), protein and blood samples, nucleic acids, building plans, recordings on tapes or discs


Type of digital data

Observations
These data are captured in real-time, either by human observation and surveys, or instruments or sensors. For this reason, they are usually irreplaceable and most important to store safely.
Examples: sensor readings, survey results, audio and/or video recordings of interviews

Experiments 
These data are typically generated in the laboratory under controlled conditions. They often are reproducible, but this procedure can be expensive or time-consuming.
Examples: gene sequences, chromatograms, magnetic field readings

Derivation or compilation
These data are generated by combining multiple existing datasets.
Examples: text and data mining, compiled database

Computations, models or simulations
These data are machine-generated from test models. The output files are likely to be reproducible as long as the model and inputs are preserved. Therefore, the large-volume output files can often be discarded when wrapping up your research project and selecting data for long-term preservation.
Examples: climate models, economic models

References (canonical data)
These data are a static or organic conglomeration or collection of smaller (peer-reviewed) datasets, most probably published and curated. 
Examples: gene sequence databases, chemical structures, or spatial data portals


Format of digital data 

When focussing upon the format of research data, one sees that (digital) research data occur in many different technical file formats, depending on the software used for collection, analysis, and processing.

However, once data analysis is completed and the data are being prepared for long-term archiving, you should consider converting your research data to a more limited set of open or standard formats to ensure long-term accessibility and usability (more information in the section On sustainable formats for data archiving in Preservation)

“Help – I don’t have any research data!”

In some disciplines – such as theoretical mathematics or law – you may ask yourself whether you actually have research data. In that case, research data can be defined as all information, generated as part of the scientific process, on which scientific conclusions are based. Just imagine: your computer crashes and your (home) office is destroyed by fire – what information would have gone lost in order to write a scientific publication? Well, that’s research data!

Also keep in mind that physical items, such as books, codes, maps, and artefacts are also research data, as well as (handwritten) notes, proofs, annotations etc. that support the conclusions in your published work.

Reuse existing data

Secondary data or existing data are generated by third parties and/or within the scope of another research project.

Why would you reuse data?

At the start of your research, look up whether there are datasets that you can reuse. In doing so …

  • You can avoid spending time and money on new, unnecessary experiments.
  • You can run analyses to verify the existing data, providing a strong basis for your follow-up research.
  • You can gain new insights for your research.
  • You can embed your research in the existing knowledge network.

How can you reuse data?

But where can you find existing datasets that are relevant for your research project? Those places where you can share your dataset(s) yourself (See section Where should you (not) preserve your data? in Preservation) are a good starting point: from the creator of the dataset themself, in a data repository, in a data paper, … There are also specific discovery services to help you find datasets, e.g. EOSC Research Hub for Data, Datasearch and Datacite.

When you have found the relevant data for your research, it is important to consider the following:

  • What is the quality of the data? Do they come from a trusted source (e.g. CoreTrustSeal)?
  • What are the terms and conditions for access and use: how can you access the data? Is it free of charge?
  • Do you have the appropriate software available to import/read/analyze the data?
  • Are there sufficient metadata provided to understand and reuse the dataset?
  • How can you reuse the data? What are the policies and regulations for sensitive data? Which license is attached to the dataset? More information.

Once you have established the accessibility, interoperability and (re)usability of the dataset, you can start processing the data.

Make sure that you cite the dataset properly, using a data citation and a persistent identifier (e.g. DOI). When there are multiple versions of the dataset, make it explicit which version you have used in your research.

Research software

What is research software? 

Research software is newly created software during a research project to support research activities such as data analysis, simulation, visualization, and machine learning. Research software includes:

  • Source code files
  • Algorithms
  • Scripts
  • Computational workflows
  • Other executables created during the research process or for research purposes

(Re)using existing software in research? 

For any software components (e.g., operating systems, libraries, dependencies, packages, scripts, etc.) used for research but not created during or with a clear research intent, it is important to consider the restrictions or licenses that apply to the software. Examples:

  • Proprietary software is owned by a private company and is not available for free. Proprietary software may have requirements for its use, such as requiring users to sign a license agreement or pay a licensing fee.
  • Open-source software is freely available for anyone to use, modify, and redistribute. Open-source software may have requirements for its use, such as requiring users to give credit to the original authors or to make their own modifications available to the public.

Reusing existing open-source software in your newly generated research software? 

Re-using (parts of) open-source software in your newly generated research software requires some additional attention.

It is important to review if the open-source software has a license:

YES - Adhere to the conditions of reuse:

- Who can use the software?

- How can the software be used?

- Can the software be modified, redistributed, and used for commercial purposes?

NO - We advise you not to reuse the software since the author did not provide permission to copy or reuse it. If you do want to use it, contact the author to verify which license applies.

Need support in determining the license conditions? Contact RDM helpdesk.

Software Management

Good Practices

Software Management Plan: A guide to implement best practices for research software development facilitating accessibility and reproducibility.

  • For who? Recommended for any researcher creating research software. However, at the moment, this is not a funder requirement or deliverable.
  • More information? Browse through the Practical Guide to Software Management Plans section 6, including examples of SMPs.
  • Interested? Contact RDM helpdesk.

GitHub

License: free (Github Enterprise access Github Campus Program)

GitHub is a cloud-based platform to store and maintain Git repositories facilitating collaboration. Available via the UHasselt Software Center or via:

Link your ORCID profile to GitHub

If you already have a GitHub account and an ORCID iD, simply sign into GitHub and go to https://github.com/settings/profile to authenticate your ORCID iD. From the settings page, scroll down to the button that says Connect your ORCID iD. Clicking that will take you to the ORCID sign-in page to authorize access. Check out this helpful instruction video to walk through the process.

Data Documentation & Metadata

Clear and detailed documentation of research data is essential to improve the data quality as well as to make your data understandable and (re)usable for yourself and others.

Documentation

Documentation is needed at two levels: documentation about the entire study or project on the one hand, and documentation about individual records, observations or data points on the other.

Study-level documentationStudy-level documentation provides high-level information about the research context and design, for example, the project title and summary, data collection methods, authors and institutions involved, sources of secondary data, license and identifier for each dataset, folders structure, file naming conventions, versioning system, the relation between files or resulting publications, and other general information.
Data-level documentation

Data-level or object-level documentation provides in-depth information about individual variables or records, for example, variable names, labels and descriptions (numeric, string, regular expression, date, etc.), units of measurement (cm, kg, etc.), calibration of instruments, controlled vocabulary or ontology terms accepted as values for each variable, missing values code, etc.

Data documentation can take many different forms. Depending on your discipline, examples may include, but are not limited to:

Best practice: create at least one readme.txt-file per dataset

A more general approach to data documentation is a so-called readme.txt-file. It is basically a plain text file in which you bring together all information that might be necessary for peers or for your future self to be able to understand and (re)use the research data. Such readme.txt-file typically contains more information on:

  • Context: e.g. research design, protocols and methods
  • Content: e.g. definition of variables and parameterization
  • Structure: e.g. relation of data, figures and tables

Like all other forms of data documentation, the file should be created simultaneously with the dataset itself, and updated if needed. Inspiring templates and examples can be found on the websites of Harvard and Cornell University.

Metadata

What is metadata?

As it stands, metadata actually serve the same purpose as data documentation, as described above: they provide all information needed to understand and reuse the data. However, while documentation can only be interpreted by humans, metadata are automated “translations” of this information and can consequently also be read by machines and computers. They are typically formatted as a .xml or .json file, either embedded in the data file itself or captured separately.

As these metadata are machine-readable, it implies that metadata are highly structured and comprise a fixed set of elements, as defined by an established metadata schema. Therefore, it is advisable not to create your own schema but to use an existing and community-endorsed standard.

By doing so, you can score on the FAIR principles and your funder's requirements.

Domain-specific metadata schema

Depending on your discipline, various domain-specific standards have already been established. You can browse for them using the following websites:


Generic metadata schema

If no specific standard for your type of research exists to date, you can always resort to a generic schema, such as Dublin Core. In its most simple form, it comprises 15 elements that can be applied to virtually every discipline. A handy tool to create your own metadata according to this schema can be found here.

Metadata repository @ UHasselt

As a UHasselt researcher, you are expected to upload the metadata of the datasets underlying your peer-reviewed publications. This can be done via the UHasselt metadata repository, which uses the metadata standard DataCite. This repository has been integrated into the Document Server (the database to deposit your publications).

Storage (during project)

Solutions for organizing research data (folder structures, file naming, versioning, etc.), security measures, back-ups, and collaboration. Consider the implications of any legal or ethical issues that might apply to your data.

Best practices for organizing data

  • Keep a golden copy of your raw data as soon as possible after data collection. The golden copy is the original version of the raw (source) data, and a duplicate (working copy) should be used for processing or analyzing data. [Based on OpenAire]
  • Use folders
    • Adhere to the existing standards within your research group
    • Use consistent, logical, and meaningful names for folders and files
      • Avoid spaces, special characters
      • Add leading zeros if necessary (e.g., 001, 002, ..., 010, 011) and standardized format for dates etc.
    • Examples:
  • Use versioning:
    • Version History
      • Look back at previous versions. Automatically available for Google Drive files (sheets, docs, etc.), Electronic Research Notebooks and Electronic Data Capture platforms.
    • Manual Version Control
      • Manually add version numbers, for example in the format of X.Y.Z (indicating X major changes, Y minor changes and Z bug fixes/typo's). Use a version control table, for example:

        Date 

        YYYY-MM-DD

        Version

        Changes

        Author (who?)

        Motivation & remarks

        2024-05-28

        V1.0

        Initial version

        Jane Doe

        Draft

        2024-05-29

        V1.1

        Initial version

        John Doe

        Review

         

    • Advanced Version Control
      • Automatically track and manage changes (e.g., Git) accompanied by a changelog.

Physical Datasets

  • Follow the standardized practices in your research group and choose a suitable storage location according to the needs for your datatype.
  • Physical access control to the offices and storage facilities (cabinet, fridge, freezer etc.) by using keys and badges.

Digital Datasets

  • Use a cloud-based storage solution (e.g., Google Shared Drive) that allows you to share your data with your hierarchical responsible (e.g., promotor, supervisor) from the start of the project.
  • As the RDM policy plan describes, sharing your research data (and related documents) with your hierarchical responsible (e.g., promotor, supervisor) is mandatory.
  • Depending on your type of research, use the applicable electronic research/lab notebook (ERN/ELNs) and/or electronic data capture platform for clinical data (EDC) to store and document your data.

More information on secure storage.

Institutional storage solution: Google Shared Drive

Google Drive, accessible by your UHasselt account, is an institutional cloud-based storage solution. The drives can mapped on your device (G-drive): My Drive and Shared drives.

Google Shared Drives is the recommended storage solution by UHasselt for research data and related documentation. A Google Shared Drive is owned by a team of contributors with at least two contributors:

  • yourself
  • your hierarchical responsible (e.g., promotor, supervisor) (= mandatory)

Request a Shared drive via the UHasselt IT Service Desk.

Good to know:

  • Storage limitations: Google Shared Drive can store a maximum of 400,000 files and folders; Individual users can only upload 750 GB daily between My Drive and all shared drives. Maximum file size is 5 TB.
  • RDM best practices provided by Google Drive: version history, access control, and 30-day file recovery.
  • A Shared Drive is owned by the team of contributors, with at least two contributors: yourself and your hierarchical responsible (e.g., promotor, supervisor).
  • Mydrive is owned by an individual.
  • Be aware when a researcher leaves the institution, UHasselt is legally not permitted to retrieve the data from MyDrive (risk of data loss).
  • The full list of differences between MyDrive - Shared Drive can be found here.
  • UHasselt has a license agreement whereby the storage of your data is well protected (i.e., conform to GDPR).
  • Be aware that your private personal Google, OneDrive or other cloud-storage solution is not part of the same agreement, and this is not a safe storage solution.

Vlaams Supercomputer Centrum (VSC) 

Storage solution for research projects with large datasets and high-performing computing needs. More information is available through the VSC website, such as:

If you conduct research in collaboration with private companies, please get in touch with Geert Jan Bex for more information.


Electronic Research Notebooks (ERN/ELNs)

What is an Electronic Research Notebook (ERN)?
An Electronic Research Notebook (ERN) or electronic lab notebook (ELN) is an efficient tool for documenting and storing data in a secure environment with access control and for collaborating with colleagues. It is a digital equivalent of the paper laboratory notebook in which data documentation (protocols, parameters, experiment set-up, descriptions/results of experiments, etc.) are stored for verification, future use, valorization purposes, etc.

Endorsed ELNs
Based on the advice of an internal multidisciplinary expert panel, Hasselt University recommends and partly supports (financially):

  • Signals Notebook: For discipline-specific use (Chemistry, Biology and Physics). Molecular drawings are possible.
  • ELabFTW: For general use. Very user-friendly, and intuitive.
  • Open Science Framework (OSF): For international use. This ELN requires additional training.

Support by ELN champions
An ELN champion is a researcher from a (cluster of) department(s) who has already been trained in ELNs and who can offer colleagues support in a very accessible way and, if necessary, can give short information sessions.

These are the UHasselt ELN champions for ELabFTW (Discipline: BIOMED):

  • Neurology: Lieve Van Veggel, Elisabeth Piccart
  • Immunology: Igna Rutten

If you would like more information about ELNs, please get in touch with our RDM helpdesk.


Electronic Data Capture (EDC) platform

What is an EDC?
Electronic data capture (EDC) software provides an efficient and safe platform in compliance with Good Clinical Practice (GCP) and GDPR, which can be used to build and manage your electronic case report form, online surveys, and databases. EDC software is mandatory for clinical studies but is not restricted to this type of research.

Endorsed EDC
At Hasselt University, the following EDC is recommended and partly supported (financially):

  • Castor. Castor complies with ICH-GCP (Good Clinical Practice) which is a requirement of the KCE (Belgian Health Care Knowledge Centre) to compete for funding.

Support
Castor EDC is coordinated by LCRC. If you have any further questions, please contact lcrc@uhasselt.be.


Github

License: free (Github Enterprise access Github Campus Program)

GitHub is a cloud-based platform to store and maintain Git repositories for software code and text-based data files, facilitating collaboration. Available via the UHasselt Software Center.

Data transfer, sharing & collaboration

  • Consider any contractual, legal, or ethical concerns before sharing data (GDPR, Intellectual property rights, ...).
  • Recommended data transferring or sharing solutions at UHasselt: 
One-time data transfer

Belnet filesender allows you to securely share large (up to 5TB) datafiles.

Avoid using external devices (e.g., USB drive, external Hard Drives) to avoid data loss and security breaches. However, if you do need to use an external device, make sure to encrypt the device. Use Bitlocker To Go (available via your Windows device).


Collaboration

Google (Shared) Drive provides access control, allowing to share files and folders throughout the research project. It is recommended to add collaborators at the level of the drive instead of at file/folder level to keep an overview of the controlled access ('no access', 'read only', 'read and write', 'admin').

Backup

  • A backup is a duplicate of a dataset created during the active use of data to prevent data loss and overwriting.
  • It is recommended to create a "golden copy" of your raw data as soon as possible after data collection and use a "working copy" for processing or analysis (even if an automatic back-up is provided).

Google shared drive 

Automatic backups are created if you use the institutionally recommended storage solution, Google Shared Drive.

Automated backup

Do you use another storage solution (e.g., C-drive of your device)?

Do you want to backup, synchronize, or mirror data to various locations?

  • Automated backups guarantee that backups will be made regularly and reduce the risk of errors.
  • Syncbackfree is a free application available via the website and the UH Software Center.

Security

Data security is relevant for any form or type of data, protecting data from unauthorized access, avoiding data loss, and ensuring research integrity.

To ensure that your data is safe, it is best to consider the 5 safes framework, providing the maximum possible security at 5 levels:

5 safesSource: https://www.stats.govt.nz/privacy-impact-assessments/integrated-data-infrastructure-overarching-privacy-impact-assessment/

Safe People

Researchers can be trusted to use data properly by

  • following protocols
  • training
  • authorization
  • additional agreements if needed

Safe Projects

The use of the data is appropriate, lawful, ethical, and sensible:

  • Ethical clearance
  • GDPR compliant
  • Data minimization
  • Storage limitation

Safe Settings

Digital and physical access control to restrict access to authorized researchers only.


Safe Data

  • Relevant for personal and confidential data

Limit the risk of a data breach or disclosure.


Safe Output

  • Relevant for personal and confidential data

Statistical results (published in articles) do not contain confidential information.

Preservation (after project)

Once your research project is wrapped up, it is important that your data are suitably stored or archived for future reuse in new research and for verification purposes.

Which data should you (not) preserve?

Hasselt University recommends in its RDM policy plan to keep relevant research data generated during research projects for a minimum of 5 years for reproducibility, verification and potential reuse:

  • Data underlying a publication or patent application;
  • Data that is necessary for validation and/or verification;
  • Unique data or data that is not easily reproducible (e.g. observational data, raw data, analysis workflow);
  • Data that will probably be reused in the future;
  • Data of great value for society (scientifically, historically or culturally).

Valid reasons not to keep certain research data include:

  • Ethical and/or legal restrictions on keeping personal data beyond the research project;
  • Contractual restrictions when (re)using third party data;
  • Easy or low-cost reproducibility of the data (e.g. output files of models);
  • Temporary or mutable nature of the data during certain stages of processing.

Do not forget to consider the preservation plans for your physical data:

  • In case of analogue data on paper, you may create a digital version by making a transcript, scan or image and subsequently destroy the original paper version (e.g. completed survey forms, handwritten notes during interviews). However, in case of documents that contain a signature and have legal value (e.g. signed consent forms), you must keep the original.
  • In case of physical samples and chemicals, you should consider the long-term stability of these materials, and the available storage facilities (that are, fridges and freezers) in terms of physical capacity as well as cost efficiency. In addition, all human bodily materials should be submitted to University Biobank Limburg.

Where should you (not) preserve your data?

It is recommended to archive your data in a so-called data repository, that is an online database where you can deposit your dataset(s). It provides many benefits, such as unique and persistent identification of datasets (e.g. DOI), the provision of rich metadata, curation through automatic back-ups and check sums, access control possibilities (e.g. authentication procedures), licensing options, etc. In order to find a trustworthy and appropriate repository for your dataset(s):

You can search for a suitable domain-specific repository using Re3Data and/or Fairsharing.

If you cannot find a domain-specific repository, you can turn to a general-purpose repository, such as Figshare, Dryad, Harvard Dataverse or Zenodo. For an overview, see the Generalist Repository Comparison Chart (3.0) by Stall, S. et al. (2023).

It is not advised to use local devices (e.g., USB or external hard drives) for archiving purposes, as you risk losing your data in case of damage or loss as well as unauthorized access.

For how long should you preserve your data?

All relevant research data generated during research projects at Hasselt University should be kept for minimal 5 years. In this regard, the university is in line with the requirements of some major Flemish funders (including FWO, EOS and VLAIO cSBO), also mandating a data retention period of at least 5 years.

For clinical trials with medicinal products for human use, the clinical trial master file must be kept for 25 years (Regulation (EU) No 536/2014 of 16 April 2014).

When working with personal data, the General Data Protection Regulation requires that these data cannot be kept longer than necessary for your current research or for possible further analyses of the data (storage limitation principle). Nonetheless, personal data can be kept longer if they are needed, for example, in order to follow-up in longitudinal studies, to verify published results, to comply with contractual obligations, or to protect Intellectual Property Rights. In addition, when your data subjects have given explicit consent to the processing of their personal data, you can also ask for their permission to keep the personal data for a fixed period of time (e.g. 5 years).

On sustainable formats for data archiving

Given the enormous variety of data types, it comes as no surprise that (digital) research data likewise occur in many different technical file formats, depending on the software used for analysis and processing. However, once data analysis is completed and the data are being prepared for long-term archiving, you should consider converting your research data to a more limited set of standard, interchangeable and longer-lasting formats. Using or converting to such sustainable formats ensures the long-term usability, accessibility and sustainability of your data, and consequently is one of the key elements for FAIR data.

This typically means using open or standard formats instead of proprietary ones. Common examples of open formats include: OpenDocument Format (ODF), ASCII, tab-delimited format (.tsv), comma-separated values (.csv) and XML. For more information on recommended file formats, check out these websites: UK Data Service and DANS.

Data Sharing

Once your research data are safely archived on a trusted platform, you can choose to share these data with others.

Why would you share your data?

By sharing your data, they can be replicated and verified, enhancing the quality and integrity of your research. In addition, you will help to accelerate innovation, because other researchers can build on your findings. If that does not convince you, take a look at what the benefits are for you personally:

  • You can make your research results known immediately to the scientific community, increasing the visibility and impact of your research.
  • Your dataset, when it is uploaded in a repository or published in a data paper, will receive a persistent identifier (e.g. DOI), making it findable and citable by other researchers.
  • By sharing your data with your peers, collaboration will become easier.
  • Your dataset will serve as official documentation accompanying your research paper.
  • You meet the requirements of institutions, funders and publishers.

Which data should you (not) share?

You do not have to share all your data, but only the data that are scientifically relevant and crucial for follow-up research.

Think twice about opening up the following types of data:

  • Sensitive data – including personal, confidential and biological data – should be protected and/or destroyed after the end of the project. If you want to share those data, you should consider the provisions as set out in the informed consent or contractual agreement, and take the necessary security measures, e.g. anonymization or pseudonymization, and encryption.
  • You may not be allowed to share the data that you have reused or the data that you have generated based on 3rd party data. Read the 3rd party data agreement carefully.
  • You may not be allowed to share data because of Intellectual Property Restrictions (IPR) or because they have a potential for tech transfer and valorisation.

For more information on sensitive data, 3rd party data, IPR and valorisation, see Ethical and legal webpage.

How can you share your (meta)data?

Do you want to hop on the data sharing train, but have no idea where you want to go to? Here are the possible recommended destinations:

  • You can deposit your dataset in a data repository: whereas the description of your dataset (that is, the metadata) will always be openly available, you can choose to share the data files themselves in open, embargoed, restricted or closed access.
  • You can publish your dataset in a data paper with a traditional journal or a specific data journal. A data paper will allow you to describe your dataset in more detail, increasing its visibility and chances of being reused.
  • Sometimes publishers require a specific repository or journal to publish the dataset underlying the paper in.
  • Researchers affiliated with UHasselt are expected to upload the metadata of the datasets underlying their peer-reviewed publications in the UHasselt metadata repository. You may not be allowed to preserve and/or open up your dataset due to ethical or legal issues, but you can (and should) always make the underlying metadata freely accessible.

Wherever you deposit or publish your (meta)data, make sure that they adhere to the FAIR principles. For this reason, we do not recommend to use one of the following alternative routes to data sharing, as they do not allow for any sort of version control or licensing, and they don’t make your data findable and accessible for a wider audience:

  • Google shared drive: only suitable for sharing research data within Hasselt University during your research project.
  • E-mail is a risky exchange tool; the Belnet filesender is a safe alternative.
  • Sometimes publishers ask you to add underlying data sets as mere ‘supplementary materials’ to the article itself.

Licenses

When sharing your research data, selecting a suitable reuse license for your data is crucial so that other researchers clearly know under which conditions they can or cannot reuse your data. For example, do you want attribution for your work, or do you want to allow others to use your work commercially, or do you want to allow others to remix, adapt, or build upon your work?

  • If the dataset does not involve software source code, you can choose from the Creative Commons licenses. The Creative Commons License Chooser conveniently helps you to select a suitable license, but keep in mind that Hasselt University requires that appropriate credit should be given to you as (co-)author and, hence, to your affiliation to Hasselt University. To put it more concretely, the university does not accept sharing research data in the public domain (CC0), but minimally requires Creative Commons Attribution 4.0 International (CC-BY-4.0).
  • If you want to share software source code, you can choose from a wide variety of open-source software licenses. A license selector can help you in deciding which license fits best to your needs for your source code.
  • More information on copyright