Summary and Schedule
A supplementary lesson for users of the Snakemake workflow system that covers how to share your worflows. This is not just a matter of uploading code, but a whole way of thinking about the workflow as a re-usable entity. Tools and approaches that lead to re-usable code are introduced with practical examples, and the WorkflowHub.eu site is presented as a publicly-funded and standards-compliant repository for shared workflows.
This lesson is designed to be taught in around in one day, and follows on from the material in the Snakemake for Bioinformatics episodes.
Learner Prerequisites
If learners have not worked through the introductory Snakemake for Bioinformatics course then they should be already familiar with writing pipelines in Snakemake, for example:
- Writing rules and linking them via input/output filenames
- Visualising the DAG generated by Snakemake (
--dag
option) - Using
--configfile
and config items with Snakemake workflows - Specifying Conda environments for Snakemake rules
Learners should also follow the set-up instructions for the introductory course, in order to have the full software environment.
Notes
WorkflowHub.eu is a FAIR registry for describing, sharing and publishing scientific computational workflows. The registry is sponsored by the European RI Cluster EOSC-Life, the European Research Infrastructure ELIXIR, and multiple EU-wide projects.
This lesson was built with The Carpentries Workbench.
Setup Instructions | Download files required for the lesson | |
Duration: 00h 00m | 1. Making a toy dataset |
What do we mean by testing in the context of workflows? What makes a suitable dataset for testing? How do we go about designing tests? |
Duration: 01h 00m | 2. Separating code and configuration |
What needs to be configurable in our workflow? How do we express these settings using the Snakemake config mechanism? How does this fit in with testing? |
Duration: 02h 00m | 3. Using standard file locations | What tools exist to check that the workflow files are all present and correct? |
Duration: 03h 00m | 4. Source code control |
What are the key ideas of source control? What are the essential GIT commands we need to manage our simple code? How can git tags help to manage our code updates? |
Duration: 04h 00m | 5. Adding a license |
What OSI licenses are available? How do I record ownership and authorship of my code? |
Duration: 05h 00m | 6. Upload to WorkflowHub |
Why should I use WorkflowHub? How does WorkflowHub compare with the Snakemake Workflow Catalogue? |
Duration: 06h 00m | 7. RO-Crates |
What are RO-crates in general? What are the specific types of RO-Crate? What can I do with an RO-Crate? |
Duration: 07h 00m | 8. Releasing a new version | What should I do to update the workflow? |
Duration: 08h 00m | Finish |
The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor.
Data Sets
We’ll need to start with a simple example Snakefile, which in this case is the sample answer given to the sequence assembly challenge.
Download the Snakefile and save it.
If you do not have the yeast files from the intro course, unpack the sample dataset tarball from https://figshare.com/ndownloader/files/42467370
You may do this in the shell with the command:
The tar file needs to be unpacked to yield the directory of files used in the course. In the shell you may do this with:
You will also need to rename the files as is normally done at the start of episode 06.
See this link for details about this dataset and the redistribution license.
Software Setup
Details
You will need the Snakemake software installed and working with conda. One way to do this is to follow the same setup instructions as for the Snakemake for Bioinformatics lesson.
This lesson is currently not tested to work on Windows. You may use a WSL Linux environment, or else connect to a Linux system.
Use Terminal.app
You are good.