Galaxy workflows
This tutorial assumes you have used Galaxy before.
Although we can use tools in Galaxy to analyse data and create a history, there is also a way to create a workflow of files, tools, settings and outputs. You can then input different datasets and run the workflow.
This tutorial covers building a workflow to analyse a bacterial genome, from input Fastq sequencing reads to assembly, annotation, and visualization.
Start
Go to your Galaxy instance and Register/Login.
Import a history of data files:
- Click on the
History cog - Select
Import from File - In the box called
Archived History URL , paste in this link address to the Galaxy history of input files:
https://swift.rc.nectar.org.au:8888/v1/AUTH_377/public/Microbial_tutorials/Galaxy_history_input_files.tar.gz
- Click
Submit - Wait a few seconds.
- Click on the “View all histories” button
- See if the Galaxy history has been imported: it will be called
imported from archive: Data - Above that pane, click on the
Switch to button. - Then click
Done (in the top left corner). -
You should now have a list of five files in your current history.
-
Re-name this history “Workflows”.
Build a workflow
We will first write a workflow for genome assembly.
- In the top menu bar in Galaxy, click on “Workflow”.
- Click on
Create new workflow
-
Under
Workflow Name: put in “Reads to Annotation”. -
Click
Create -
This will bring up the “Workflow Canvas”, a grid where you can arrange the workflow.
Add inputs
-
In the Tools panel, click
Inputs: Input datset twice (at the very top of the list). -
A box will appear: drag it to the left and there will be another box underneath it. Drag this also to the left. Your workflow canvas should look like this:
- Click on the first box. Look in the right hand panel (now called “Details”) and change the name of the Input dataset to
R1.fastq . Press Enter for the change to be saved.
- Repeat for the second input dataset box, naming that one
R2.fastq .
Add the tool “spades”
- In the tools panel, click on
NGS Analysis: NGS Assembly: spades . This puts the spades box onto the workflow canvas.
-
Click on the spades box and look in the Details pane on the right. This shows all the options in spades. Choose:
-
Run only Assembly : Yes [the Yes button should be darker grey] Kmers to use separated by commas: 33,55,91 [note: no spaces]Coverage cutoff: auto
Join inputs to the tool
Now tell spades which input files to use.
-
Look at the input dataset box called
R1.fastq and find the small arrow: > -
Click on this and drag the arrow over to the spades box input arrow > next to “Libraries 1 > Files 1 > Forward reads”.
- Repeat for the dataset box
R2.fastq , joining to the spades box next to “Libraries 1 > Files 1 > Reverse reads”.
Save it and run
- Click on the cog at the top right of the workflow canvas and “Save”.
-
Click the cog again and choose “Run”.
-
This brings up a window where you specify the input datasets to use in the workflow.
- Under
Step1: Input dataset choosemutant_R1.fastq . - Under
Step2: Input dataset choosemutant_R2.fastq .
- Under
-
Click
Run workflow .
This will run the workflow (spades) and save the output to the top of your current history in the right hand panel.
- View some of the output files with the eye icon to check that the workflow (in this case, just spades) ran correctly.
Add to the worfklow
We will add another tool to the workflow.
-
Go to the top Galaxy panel and click “Workflow”.
-
Your workflow
Reads to Annotation should be in the list. Click on the drop-down arrow next to this workflow and chooseEdit . -
This will bring up the Workflow Canvas where we can add more inputs and tools.
-
In the Tools panel, click on
NGS Annotation: Prokka . This will add a Prokka box to the workflow canvas. -
We need to tell Prokka which genome assembly) to annotate. Join the spades output called
out_contigs(fasta) to the Prokka input calledContigs to annotate .
-
Click on the Prokka box and change some of the settings in the right hand Details panel:
- Set the following parameters (leave everything else unchanged):
Locus tag prefix (–locustag) : PForce GenBank/ENA/DDJB compliance (–compliant) : NoSequencing Centre ID (–centre) : VUse genus-specific BLAST database No
-
Click on the cog to the top right of the workflow canvas to save.
-
Click on the cog again to run.
- Again, choose the input files:
mutant_R1.fastq andmutant_R2.fastq , and then clickRun workflow .
- Again, choose the input files:
-
The output from the workflow (files from spades and prokka) will appear at the top of the History panel.
-
Click on the eye icon for some files to verify the workflow ran correctly.
Add more to the workflow
We will add a visualization tool to view the genome annotation.
-
Go to the top Galaxy panel and click “Workflow”.
-
Your workflow
Reads to Annotation should be in the list. Click on the drop-down arrow next to this workflow and chooseEdit . -
This will bring up the Workflow Canvas where we can add more inputs and tools.
-
In the Tools panel, click on
Statistics and Visualisation: Graph/Display Data: JBrowse . This will add a JBrowse box to the workflow canvas. -
Click on the JBrowse box. In the Details pane:
-
Under
JBrowse-in-Galaxy Action choose New JBrowse Instance. -
Under
Reference genome to display choose Use a genome from history. -
For
Produce a Standalone Instance select Yes. -
For
Genetic Code choose 11: The Bacterial, Archaeal and Plant Plastid Code. -
Click
Insert Track Group -
Under
Track Category type in gene annotations. -
Click
Insert Annotation Track -
For
Track Type choose GFF/GFF3/BED/GBK Features -
Under
JBrowse Track Type[Advanced] select Canvas Features. -
Click on
JBrowse Styling Options -
Under
JBrowse style.label correct the word “prodcut” to “product”. -
Under
Track Visibility choose On for new users.
-
-
Now we need to tell JBrowse the input files to use.
-
Join the Prokka output
out_fna (fasta) to the JBrowse inputFasta sequences -
Join the Prokka output
out_gff (gff) to the JBrowse inputTrack Group 1…
-
-
Click on the cog to save; again to run; choose input files;
Run workflow ; examine output files in current history. -
The workflow will now assemble and annotate the genome, and create a JBrowse view of the annotations.
-
JBrowse will produce one output file.
- Click on the eye icon to view.
- In the centre drop down box, choose contig 6.
- Under “Available Tracks” on the left, tick the boxes.
- Zoom in and out with the plus and minus icons.
- The blue blocks are the genome annotations.
Summary
-
Our workflow is now:
Fastq sequence reads to Spades for assembly- Spades
contigs fasta file to Prokka for annotation - Prokka
fasta file and.gff file to JBrowse for visualisation.
-
We can re-run this workflow with different input Fastq files.
Other workflow options
Saving outputs
To save only some output files:
- Go to the workflow canvas.
- Find the star next to the outputs.
- Click on the star for any outputs you want to save.
To save these starred files from the workflow output as a new history:
- Before you click
Run workflow , tick the box above toSend results to a new history .
Import a workflow
To import an existing Galaxy Workflow:
- Go to the Workflow tab in the top panel.
- At the top right, click on
Upload or import workflow .
Extract a workflow
You can extract a workflow from an existing Galaxy history.
- Go to your Galaxy history
- Click on the History cog icon and choose “Extract Workflow”.
- Give it a name and click
Create Workflow . - To edit, go to the Workflow tab, select the workflow, and choose “Edit” from the drop down menu. You can then edit the steps on the Workflow Canvas.
A note on workflow tabs
We have been using the top Workflow tab. There is another tab at the bottom of the tool panel called Workflows. Click on
To return to the main Galaxy window click on the Analyze Data tab in the top panel.
Links
Introduction to workflows: https://wiki.galaxyproject.org/Learn/AdvancedWorkflow
Another tutorial on workflows: http://vlsci.github.io/lscc_docs/tutorials/galaxy-workflows/galaxy-workflows/
Galaxy published workflows: https://usegalaxy.org/workflow/list_published