Summary and Setup
A supplementary lesson for users of the Snakemake workflow system that covers how to share your worflows. This is not just a matter of uploading code, but a whole way of thinking about the workflow as a re-usable entity. Tools and approaches that lead to re-usable code are introduced with practical examples, and the WorkflowHub.eu site is presented as a publicly-funded and standards-compliant repository for shared workflows.
This lesson is designed to be taught in around in one day, and follows on from the material in the Snakemake for Bioinformatics episodes.
Learner Prerequisites
If learners have not worked through the introductory Snakemake for Bioinformatics course then they should be already familiar with writing pipelines in Snakemake, for example:
- Writing rules and linking them via input/output filenames
- Visualising the DAG generated by Snakemake (
--dag
option) - Using
--configfile
and config items with Snakemake workflows - Specifying Conda environments for Snakemake rules
Learners should also follow the set-up instructions for the introductory course, in order to have the full software environment.
Notes
WorkflowHub.eu is a FAIR registry for describing, sharing and publishing scientific computational workflows. The registry is sponsored by the European RI Cluster EOSC-Life, the European Research Infrastructure ELIXIR, and multiple EU-wide projects.
This lesson was built with The Carpentries Workbench.
Data Sets
We’ll need to start with a simple example Snakefile, which in this case is the sample answer given to the sequence assembly challenge.
Download the Snakefile and save it.
If you do not have the yeast files from the intro course, unpack the sample dataset tarball from https://figshare.com/ndownloader/files/42467370
You may do this in the shell with the command:
The tar file needs to be unpacked to yield the directory of files used in the course. In the shell you may do this with:
You will also need to rename the files as is normally done at the start of episode 06.
See this link for details about this dataset and the redistribution license.
Software Setup
Details
You will need the Snakemake software installed and working with conda. One way to do this is to follow the same setup instructions as for the Snakemake for Bioinformatics lesson.
This lesson is currently not tested to work on Windows. You may use a WSL Linux environment, or else connect to a Linux system.
Use Terminal.app
You are good.