Galaxy workflows

This tutorial assumes you have used Galaxy before.

Although we can use tools in Galaxy to analyse data and create a history, there is also a way to create a workflow of files, tools, settings and outputs. You can then input different datasets and run the workflow.

This tutorial covers building a workflow to analyse a bacterial genome, from input Fastq sequencing reads to assembly, annotation, and visualization.

Start

Go to your Galaxy instance and Register/Login.

Import a history of data files:

Click on the History cog
Select Import from File
In the box called Archived History URL, paste in this link address to the Galaxy history of input files:

https://swift.rc.nectar.org.au:8888/v1/AUTH_377/public/Microbial_tutorials/Galaxy_history_input_files.tar.gz

Click Submit
Wait a few seconds.
Click on the “View all histories” button
See if the Galaxy history has been imported: it will be called imported from archive: Data
Above that pane, click on the Switch to button.
Then click Done (in the top left corner).
You should now have a list of five files in your current history.
Re-name this history “Workflows”.

Build a workflow

We will first write a workflow for genome assembly.

In the top menu bar in Galaxy, click on “Workflow”.

workflowmenu

Click on Create new workflow

createnew

Under Workflow Name: put in “Reads to Annotation”.
Click Create
This will bring up the “Workflow Canvas”, a grid where you can arrange the workflow.

Add inputs

In the Tools panel, click Inputs: Input datset twice (at the very top of the list).
A box will appear: drag it to the left and there will be another box underneath it. Drag this also to the left. Your workflow canvas should look like this:

inputs

Click on the first box. Look in the right hand panel (now called “Details”) and change the name of the Input dataset to R1.fastq. Press Enter for the change to be saved.

input_name

Repeat for the second input dataset box, naming that one R2.fastq.

Add the tool “spades”

In the tools panel, click on NGS Analysis: NGS Assembly: spades. This puts the spades box onto the workflow canvas.

spades

Click on the spades box and look in the Details pane on the right. This shows all the options in spades. Choose:
Run only Assembly: Yes [the Yes button should be darker grey]
Kmers to use separated by commas: 33,55,91 [note: no spaces]
Coverage cutoff: auto

Join inputs to the tool

Now tell spades which input files to use.

Look at the input dataset box called R1.fastq and find the small arrow: >
Click on this and drag the arrow over to the spades box input arrow > next to “Libraries 1 > Files 1 > Forward reads”.

join boxes

Repeat for the dataset box R2.fastq, joining to the spades box next to “Libraries 1 > Files 1 > Reverse reads”.

Save it and run

Click on the cog at the top right of the workflow canvas and “Save”.

save workflow

Click the cog again and choose “Run”.
This brings up a window where you specify the input datasets to use in the workflow.
- Under Step1: Input dataset choose mutant_R1.fastq.
- Under Step2: Input dataset choose mutant_R2.fastq.
Click Run workflow.

This will run the workflow (spades) and save the output to the top of your current history in the right hand panel.

View some of the output files with the eye icon to check that the workflow (in this case, just spades) ran correctly.

Add to the worfklow

We will add another tool to the workflow.

Go to the top Galaxy panel and click “Workflow”.
Your workflow Reads to Annotation should be in the list. Click on the drop-down arrow next to this workflow and choose Edit.
This will bring up the Workflow Canvas where we can add more inputs and tools.
In the Tools panel, click on NGS Annotation: Prokka. This will add a Prokka box to the workflow canvas.
We need to tell Prokka which genome assembly) to annotate. Join the spades output called out_contigs(fasta) to the Prokka input called Contigs to annotate.

join spades to prokka

Click on the Prokka box and change some of the settings in the right hand Details panel:
- Set the following parameters (leave everything else unchanged):
- Locus tag prefix (–locustag): P
- Force GenBank/ENA/DDJB compliance (–compliant): No
- Sequencing Centre ID (–centre): V
- Use genus-specific BLAST database No
Click on the cog to the top right of the workflow canvas to save.
Click on the cog again to run.
- Again, choose the input files: mutant_R1.fastq and mutant_R2.fastq, and then click Run workflow.
The output from the workflow (files from spades and prokka) will appear at the top of the History panel.
Click on the eye icon for some files to verify the workflow ran correctly.

Add more to the workflow

We will add a visualization tool to view the genome annotation.

Go to the top Galaxy panel and click “Workflow”.
Your workflow Reads to Annotation should be in the list. Click on the drop-down arrow next to this workflow and choose Edit.
This will bring up the Workflow Canvas where we can add more inputs and tools.
In the Tools panel, click on Statistics and Visualisation: Graph/Display Data: JBrowse. This will add a JBrowse box to the workflow canvas.
Click on the JBrowse box. In the Details pane:
- Under JBrowse-in-Galaxy Action choose New JBrowse Instance.
- Under Reference genome to display choose Use a genome from history.
- For Produce a Standalone Instance select Yes.
- For Genetic Code choose 11: The Bacterial, Archaeal and Plant Plastid Code.
- Click Insert Track Group
- Under Track Category type in gene annotations.
- Click Insert Annotation Track
- For Track Type choose GFF/GFF3/BED/GBK Features
- Under JBrowse Track Type[Advanced] select Canvas Features.
- Click on JBrowse Styling Options <Advanced]
- Under JBrowse style.label correct the word “prodcut” to “product”.
- Under Track Visibility choose On for new users.
Now we need to tell JBrowse the input files to use.
- Join the Prokka output out_fna (fasta) to the JBrowse input Fasta sequences
- Join the Prokka output out_gff (gff) to the JBrowse input Track Group 1…

jbrowse

Click on the cog to save; again to run; choose input files; Run workflow; examine output files in current history.
The workflow will now assemble and annotate the genome, and create a JBrowse view of the annotations.
JBrowse will produce one output file.
- Click on the eye icon to view.
- In the centre drop down box, choose contig 6.
- Under “Available Tracks” on the left, tick the boxes.
- Zoom in and out with the plus and minus icons.
- The blue blocks are the genome annotations.

annotations

Summary

Our workflow is now:
- Fastq sequence reads to Spades for assembly
- Spades contigs fasta file to Prokka for annotation
- Prokka fasta file and .gff file to JBrowse for visualisation.
We can re-run this workflow with different input Fastq files.

Other workflow options

Saving outputs

To save only some output files:

Go to the workflow canvas.
Find the star next to the outputs.
Click on the star for any outputs you want to save.

To save these starred files from the workflow output as a new history:

Before you click Run workflow, tick the box above to Send results to a new history.

Import a workflow

To import an existing Galaxy Workflow:

Go to the Workflow tab in the top panel.
At the top right, click on Upload or import workflow.

Extract a workflow

You can extract a workflow from an existing Galaxy history.

Go to your Galaxy history
Click on the History cog icon and choose “Extract Workflow”.
Give it a name and click Create Workflow.
To edit, go to the Workflow tab, select the workflow, and choose “Edit” from the drop down menu. You can then edit the steps on the Workflow Canvas.

A note on workflow tabs

We have been using the top Workflow tab. There is another tab at the bottom of the tool panel called Workflows. Click on Workflows: All Workflows. This gives a similar view with a list of workflows, and you can also click on the top right tab “switch to workflow management view”.

To return to the main Galaxy window click on the Analyze Data tab in the top panel.

Links

Introduction to workflows: https://wiki.galaxyproject.org/Learn/AdvancedWorkflow

Another tutorial on workflows: http://vlsci.github.io/lscc_docs/tutorials/galaxy-workflows/galaxy-workflows/

Galaxy published workflows: https://usegalaxy.org/workflow/list_published