About

Digitalisation changes the way how work is done in virtually all scientific disciplines. This has far-reaching consequences for teaching, research and interdisciplinary collaboration. Bioinformatics exemplifies how new ways of scientific work and ways of gaining knowledge emerge within a branch of science that is challenged with fast growing data volumes. Fast sequencers and high-resolution microscopes generate steadily more data which in turn allow for a more detailed and complex analysis. This turned life sciences into data sciences. Bioinformatics is a fast-growing discipline that requires not only workflows for analysis but also a standardised documentation of data. A first step to accommodate the specific needs of the bioinformatics community was the creation of de.NBI (German Network for Bioinformatics Infrastructure). Data must be stored for analysis, but they can also be used to reproduce and validate research as well as input for new research questions. Data can only be reused if there is enough descriptive content, e.g. in the form of standardised metadata, available. Handling huge amounts of data necessitates the implementation of workflows between a storage and analysis system and the collaboration in national and international networks such as ELIXIR and Galaxy. The necessary infrastructure can no longer be provided by single institutions.

Existing data management in bioinformatics ranges from storage infrastructure for “hot” data to first concepts of long-term preservation. But there is a lack of coordination within the community as well as of infrastructure to use cloud-based solutions. Complexity and technological change on one side and shared requirements on the other side call for a holistic approach.

BioDATEN Lifecycle

The approaches to data management in the Bioinformatics community will be transferred into professional structures with clear-cut responsibilities. While researchers are still in charge of the actual research, the structures will cover the complete data lifecycle from planning to sustainable referencing and (re)use of data after the original research project has been completed.

In this context, the project BioDATEN (Bioinformatics DATa ENvironment) was funded by the Ministry of Science, Research and the Arts of the State of Baden-Württemberg for four years as part of digital@bw. The aim of the project is to provide the foundation of a Science Data Centre in close cooperation with the life science communities such as research groups and infrastructure providers such as libraries and computing centres.

BioDATEN uses already existing infrastructure such as bwSFS (Storage for Science), BinAC (Bioinformatics and Astrophysics Cluster) and the de.NBI cloud (German Network for Bioinformatics Infrastructure) as well as repositories operated by the university libraries Konstanz and Tübingen. BioDATEN will also coordinate with other players in research data management.

During the project, rules for the preservation of and access to research data will be established and developed. Infrastructure and scientific methods for data analysis will be advanced. A central question during the project is: How can we annotate, curate and mark-up subject-specific and organisational metadata in a unified way that also considers legal and technical aspects such as sensitive data.

Data repositories will provide research infrastructure that is geared to provide unified standards and workflows. In turn, this will provide easy access to infrastructure and data alike which will contribute to equal opportunities in research, especially for young academics. Simultaneously, access to national and international networks, which is necessary for processing huge amounts of data, will be facilitated for research groups. A central piece of the project is the education of young researches in terms of teaching them methods of digitised research, stat-of-the-art procedures in information technologies and research data management. BioDATEN aims to provide a generic access to infrastructure and wants to develop methods for long-term accessibility to data. In principle, BioDATEN is open to additional communities.

There are five work packages: Package 1 (AP 1) covers bioinformatics, package 2 addresses education and package 3 deals with data science. Services and infrastructure are covered in package 4. Package 5 (not pictured) is responsible for project management. The white boxes indicate that BioDATEN is open to additional communities such as humanities and hydrology.

BioDATEN work packages

Institutions

Time Span

01.07.2019-30.06.2023