Scaling up PerMed data analysis tools for HPC
Identifying best ways to scale up personalised medicine data analysis tools for use on modern HPC platforms


Supercomputer Puhti (BullSequana X400, Atos). Photo: Mikael Kanerva, CSC

Supercomputer Puhti (BullSequana X400, Atos). Photo: Mikael Kanerva, CSC

PerMedCoE faces an exciting challenge: to open up frontiers in personalised medicine by enabling the efficient use of cellular simulation software on current and upcoming high-performance computing (HPC) platforms, including LUMI. This challenge calls for identifying the best ways in which different software should be made available. The scope of development work ranges from arranging the software into easy-to-use workflows and user interfaces to ensuring cross-platform compatibility, while accounting for requirements for the handling of sensitive data.

PerMedCoE’s work on optimisation and usability, led by CSC – IT Center for Science, aims to optimise the implementation of personalised medicine software on diverse HPC environments by grouping software tools into building blocks. Within PerMedCoE, the term ‘building block’ refers to a specific analytical functionality provided by the software tools in use. Another objective in this area of work is to combine individual building blocks to establish workflows that can be used to meet the needs of different scientific use cases, such as research into COVID-19 modeling.

With Singularity and other solutions for software containerisation rapidly gaining popularity in the HPC field, we have focused on producing unified guidelines for using containers to construct building blocks. As part of this work, critical questions have been identified that must be solved to improve the usability of workflows on computing platforms hosted by PerMedCoE partners across Europe. Examples of specific topics have included design choices concerning software configuration, options for harnessing the computational power of HPC systems, options for cross-platform user management, and ways to process data in compliance with regulations for the handling of personally sensitive information.

Another central task has been to construct a pilot workflow to explore the scaling-up of personalised medicine software tools for their use on HPC platforms. Development work in this area has been carried out by the Barcelona Supercomputing Center, initially focusing on the cellular simulation tool MaBoSS and PyCOMPSs, a framework for parallel computing using the programming language Python.

Author: Jesse Harrison (CSC – IT Center for Science)