Workflow management tools like Snakemake or Nextflow have proven incredibly valuable in large-scale bioinformatics analysis. While I don’t have a particular preference, I have had extensive experience with Snakemake as a Student Assistant at the Environmental Sciences Department of Aarhus University.
Snakemake files are just Python files extended with a declarative sugar syntax like MakeFiles. Unlike traditional Makefiles, it is designed explicitly for scientific workflows.
In my role, I have focused on automating pipelines for the High-Throughput Sequencing Center of Aarhus University (Roskilde). This has allowed me to delve into working with HPC clusters, employing Singularity containers, and gaining in-depth knowledge of the bioinformatics pipelines.
To date, I have developed in Snakemake a Whole-Genome Sequencing (WGS) pipeline for prokaryotes, a TotalRNA metatranscriptomics pipeline, and one pipeline for long-amplicon sequencing for Oxford Nanopore. Please check the recently published papers of Jaarsma et al. (2023) and Scheel et al. (2023) to see papers that used them.
Using Snakemake wrappers, small reusable scripts that facilitate using popular bioinformatics programs, has made my life much easier. When I had the time and saw the opportunity, I contributed with PRs to the snakemake-wrappers repository, improving or adding new features :)
I have made all of my code publicly available on GitHub. I am very grateful to be in a work environment encouraging Free software. I always continue to be impressed by the open contributions of the scientific community to bioinformatics.
If you are more interested, I invite you to look at the organization’s repository (https://github.com/AU-ENVS-Bioinformatics/).