Assembly using Spades

Keywords: de novo assembly, Spades, Galaxy, Microbial Genomics Virtual Lab

Background

Spades is one of a number of de novo assemblers that use short read sets as input (e.g. Illumina Reads), and the assembly method is based on de Bruijn graphs. For information about Spades see this link.

In this activity, we will perform a de novo assembly of a short read set using the Spades assembler. The output from Spades that we are interested in is a multiFASTA file that contains the draft genome sequence.

Learning objectives

At the end of this tutorial you should be able to:

  1. assemble the reads using Spades, and
  2. examine the output assembly.

Import and view data

See here for information about how to start with Galaxy, and here for the link to import the Galaxy history for this tutorial, if you don’t already have them in your history.

  • The read set for today is from an imaginary Staphylococcus aureus bacterium with a miniature genome.
  • The whole genome shotgun method used to sequence our mutant strain read set was produced on an Illumina DNA sequencing instrument.

  • The files we need for assembly are the mutant_R1.fastq and mutant_R2.fastq.

  • (We don’t need the reference genome sequences for this tutorial).

  • The reads are paired-end.

  • Each read is 150 bases long.

  • The number of bases sequenced is equivalent to 19x the genome sequence of the wildtype strain. (Read coverage 19x - rather low!).

  • Click on the View Data button (the Eye icon) next to one of the FASTQ sequence files.

Assemble reads with Spades

  • We will perform a de novo assembly of the mutant FASTQ reads into long contiguous sequences (in FASTA format.)
  • Go to Tools → NGS Analysis → NGS: Assembly → spades
  • Set the following parameters (leave other settings as they are):

    • Run only Assembly: Yes [the Yes button should be darker grey]
    • Kmers to use separated by commas: 33,55,91 [note: no spaces]
    • Coverage cutoff: auto
    • Files → Forward reads: mutant_R1.fastq
    • Files → Reverse reads: mutant_R2.fastq
  • Your tool interface should look like this:

Spades interface

  • Click Execute

Examine the output

  • Galaxy is now running Spades on the reads for you.
  • When it is finished, you will have five new files in your history.

    • two FASTA files of the resulting contigs and scaffolds
    • two files for statistics about these
    • the Spades logfile

spades output

  • Click on the View Data button Eye icon on each of the files.
  • Note that the short reads have been assembled into much longer contigs.
  • (However, in this case, the contigs have not been assembled into larger scaffolds.)
  • The stats files will give you the length of each of the contigs.

spades output contigs