High Performance Bioinformatics

You are here

Course Overview

We live in a big-data era, and simple serial bioinformatics pipelines cannot efficiently handle huge datasets. High Performance Computing (HPC) represents an effective solution for researchers who need to analyze and address new biological questions using large-scale data.
This course is both theoretical and practical, aimed at bioinformaticians who want to scale up their analyses on cluster machines. It focuses on the development and execution of automated, reproducible pipelines.
Ad-hoc hands-on sessions will be held every day.

Day 1 – Introduction to HPC and Cluster Basics

Topics:
Cluster architecture: hardware, storage, and software environment
Module system and software navigation
Submitting jobs via the SLURM scheduler
Running single-step batch scripts

Day 2 – NGS Pipelines, Singularity, and GPU

Topics:
Introduction to Next Generation Sequencing (RNA-seq focus)
Building automated pipelines for large datasets
Job concatenation and HPC resource optimization
Singularity containers: packaging bioinformatics tools for reproducibility
GPU usage: running accelerated bioinformatics tasks

Day 3 – Workflow Management with Snakemake & Cloud Scaling

Topics:
Introduction to workflow management concepts
Snakemake basics: rules, dependencies, and reproducibility
Scaling pipelines on cluster and cloud environments without modifying workflow definitions
Tips for portable and scalable pipeline design

Skills Acquired

By the end of the course, students should be able to:
Navigate HPC resources and the software environment
Submit and monitor jobs using SLURM
Use Singularity containers for reproducible pipelines
Run GPU-accelerated jobs
Build an automated pipeline handling large datasets
Apply Snakemake to create scalable, portable, and reproducible workflows

Target Audience

Biologists, bioinformaticians, and computer scientists interested in large-scale NGS data analysis.

Prerequisites

Good knowledge of Python and shell command line
Basic knowledge of R and biology recommended but not required

Intended for: 
Research Institutions
Schools
Universities
Area: 
Science
Length: 
3 dd
Provided as: 
Ordinary Course

Next courses

Non sono previste edizioni di questo corso.

Any question?

For HPC and computer graphics courses, write to corsi.hpc@cineca.it

About CINECA

Cineca is a non profit Consortium, made up of 102 Italian national institutions: Universities, Italian Research Institutions and the Italian Ministries of Universities and Education.

Today it is the largest Italian computing centre, one of the most important worldwide. With more seven hundred employees, it operates in the technological transfer sector through high performance scientific computing, the management and development of networks and web based services, and the development of complex information systems for treating large amounts of data.

It develops advanced Information Technology applications and services, acting like a trait-d'union between the academic world, the sphere of pure research and the world of industry and Public Administration. .

Visit the Cineca website