Skip to content

Nanopore tutorial

We will be using the Galaxy server that you have been using this week so far so please open that up.

Data we will use

Please click on these filenames. Then right-click on View Raw, and save to your computer.

Non-Galaxy tools needed (download)

Introduction

Go to your Galaxy instance. Make sure you are registered and logged in. Refresh the page.

  • In the top menu, click on Shared Data and then Data Libraries
  • Click on Nanopore Data
  • A list of files will appear
  • Tick the box next to these files: EM_079517.fasta and sakai_prophages.fa

  • Then click the 📒 to History button at the top

  • Click Import
  • The files should now be in your current history.

First off, we are going to start by using some Ebola viral genome data to look at mapping (aligning) Nanopore reads to a reference.
We are going to use bwa mem that has long read capability.

Step 1: Fastq-dump

First we need to retrieve Ebola genomic data from NCBI.

  • We will use the Get Data tab on the left hand side panel of Galaxy, under Tools.
  • Click on Download and Extract Reads in FASTA/Q
  • Options will now be visible in the middle panel of galaxy
  • The input type should be SRR accession
  • In Accession enter ERR1014225 (This is the accession number for nanopore reads of an Ebola sample from the West African outbreak)
  • Select gzip compressed fastq for the output format
  • Click Execute

Alt text

  • The job will then appear in the History panel on the right hand side of the screen

Alt text

  • The job will turn green when it has finished

Step 2: BWA-MEM

Next we will map our nanopore Ebola reads to a reference.

  • We will use the NGS:Mapping tab on the left hand side panel of Galaxy, under Tools.
  • Click on Map with BWA-MEM
  • Options will now be visible in the middle panel of galaxy
  • Select Use a genome from history and build index
  • The reference sequence is already in your current history: EM_079517.fasta (This is a finished Ebola reference genome for strain Zaire)
  • For Algorithm select Auto. Let BWA decide the best algorithm to use
  • Select Single for type of reads
  • Select ERR1014225 (fastq-dump) for fastq dataset
  • Select Do not set for read groups information
  • Select 3. Nanopore 2D-reads mode (-x ont2d) for analysis mode
  • Click Execute

Screenshot

When it has finished running, we will download the mapped reads.

  • Click on the finished job (Called Map with BWA-MEM, etc.) in the History panel on the right hand side panel
  • You will see a save icon under the details of the job
  • Click on it and the Download dataset and Download bam_index options will appear
  • We need both so click on one after the other

Alt text

Step 3: Viewing the mapping in Tablet

  • We will load the downloaded data into Tablet by opening the Tablet program and clicking on Open Assembly in the top left hand side of the screen
  • For Primary assembly file or URL: click on Browse… and find the downloaded Bam file (you need the file with the extension .bam not .bai)
  • For Reference/consensus file or URL: click on Browse… and find the downloaded (from dropbox) reference genome EM_079517.fasta

Alt text

  • Click Open and then click on the Contig EM_079517 on the left hand panel of the screen and you will see your mapping results ready to browse
  • Take a few minutes to explore the program and the results

Alt text

Based on the alignment, What can you say about the sequencing method?

Does the data look different to Illumina data that you might have seen?

Can you find anything that looks like consistent errors?

Can you find anything that looks like a reliable SNP?

Step 4: Comparing Illumina and Nanopore assemblies

Next, we’re going to use Escherichia coli to illustrate how much assemblies can be improved with long reads

Pre-knowledge

  • Pathogenic E. coli tends to have a lot repeat sequences
  • This is due to large prophages integrating in the genome
  • These prophages are very similar to each other and can confuse assembly algorithms
  • These prophages can be as large as ~50kb
  • No matter how deep you sequence, if you don’t get reads longer than those repeats you will never finish the genome
  • See this paper for more detail

Bandage

Nodes = contigs
Edges = overlaps
  • Open up Bandage from your desktop
  • Click on File then Load graph
  • Find the illumina_assembly_graph.fastg location on your computer and click Open

Alt text

  • Click Draw graph on the left hand panel of the screen
  • You will see that the Illumina assembly produces a whole mess of 649 contigs with a lot of short contigs connecting everything together
  • You can zoom in on the graph to look at it in more detail

Alt text

  • We can colour the graph by depth by changing Graph display from Random colours to Colour by depth
  • You’ll see that the short contigs all have much higher coverage than the other contigs. What do you think that means?

Alt text

Now we are going to use BLAST to look for the prophage regions in the assembly

  • Click on Create/view BLAST search in the left hand panel of the screen
  • Step 1 click on Build BLAST database
  • Step 2 click on Load from FASTA file and find the sakai_prophages.fa location on your computer
  • Step 3 click on Run BLAST search
  • click on Close

Alt text

You will see the prophage regions highlighted in colour on the assembly graph now.

What effect are the prophage regions having on the assembly?

Alt text

Now, we will load the nanopore assembly into Bandage to see how long reads affect this

  • Click on File then Load graph
  • Find the O55_combined_miniasm.gfa location on your computer and click Open
  • Click Draw graph on the left hand panel of the screen
  • Change Graph display option from BLAST hits (solid) to Random colours. What is strikingly different about this assembly from the Illumina one?

Alt text

Now, lets look at where the prophage regions are in this assembly

  • Click on Create/view BLAST search in the left hand panel of the screen
  • Step 1 click on Build BLAST database
  • Step 2 click on Load from FASTA file and find the sakai_prophages.fa location on your computer
  • Step 3 click on Run BLAST search
  • Click on Close

Alt text

  1. How have the long reads improved the assembly?
  2. Are there any regions that haven’t been resolved by the nanopore experiment? Why would that be?