Nanopore tutorial
We will be using the Galaxy server that you have been using this week so far so please open that up.
Data we will use
Please click on these filenames. Then right-click on View Raw
, and save to your computer.
Non-Galaxy tools needed (download)
Introduction
Go to your Galaxy instance. Make sure you are registered and logged in. Refresh the page.
- In the top menu, click on
Shared Data and thenData Libraries - Click on
Nanopore Data - A list of files will appear
-
Tick the box next to these files:
EM_079517.fasta andsakai_prophages.fa -
Then click the
📒 to History button at the top - Click
Import - The files should now be in your current history.
First off, we are going to start by using some Ebola viral genome data to look at
mapping (aligning) Nanopore reads to a reference.
We are going to use bwa mem
that has long read capability.
Step 1: Fastq-dump
First we need to retrieve Ebola genomic data from NCBI.
- We will use the
Get Data tab on the left hand side panel of Galaxy, underTools . - Click on
Download and Extract Reads in FASTA/Q - Options will now be visible in the middle panel of galaxy
- The input type should be
SRR accession - In Accession enter
ERR1014225 (This is the accession number for nanopore reads of an Ebola sample from the West African outbreak) - Select
gzip compressed fastq for the output format - Click
Execute
- The job will then appear in the
History panel on the right hand side of the screen
- The job will turn green when it has finished
Step 2: BWA-MEM
Next we will map our nanopore Ebola reads to a reference.
- We will use the
NGS:Mapping tab on the left hand side panel of Galaxy, underTools . - Click on
Map with BWA-MEM - Options will now be visible in the middle panel of galaxy
- Select
Use a genome from history and build index - The reference sequence is already in your current history:
EM_079517.fasta (This is a finished Ebola reference genome for strain Zaire) - For Algorithm select
Auto. Let BWA decide the best algorithm to use - Select
Single for type of reads - Select
ERR1014225 (fastq-dump) for fastq dataset - Select
Do not set for read groups information - Select
3. Nanopore 2D-reads mode (-x ont2d) for analysis mode - Click
Execute
When it has finished running, we will download the mapped reads.
- Click on the finished job (Called
Map with BWA-MEM , etc.) in theHistory panel on the right hand side panel - You will see a save icon under the details of the job
- Click on it and the
Download dataset andDownload bam_index options will appear - We need both so click on one after the other
Step 3: Viewing the mapping in Tablet
- We will load the downloaded data into Tablet by opening the Tablet program and clicking on
Open Assembly in the top left hand side of the screen - For
Primary assembly file or URL: click onBrowse… and find the downloaded Bam file (you need the file with the extension.bam not.bai ) - For
Reference/consensus file or URL: click onBrowse… and find the downloaded (from dropbox) reference genomeEM_079517.fasta
- Click
Open and then click on the ContigEM_079517 on the left hand panel of the screen and you will see your mapping results ready to browse - Take a few minutes to explore the program and the results
Based on the alignment, What can you say about the sequencing method?
Does the data look different to Illumina data that you might have seen?
Can you find anything that looks like consistent errors?
Can you find anything that looks like a reliable SNP?
Step 4: Comparing Illumina and Nanopore assemblies
Next, we’re going to use Escherichia coli to illustrate how much assemblies can be improved with long reads
Pre-knowledge
- Pathogenic E. coli tends to have a lot repeat sequences
- This is due to large prophages integrating in the genome
- These prophages are very similar to each other and can confuse assembly algorithms
- These prophages can be as large as ~50kb
- No matter how deep you sequence, if you don’t get reads longer than those repeats you will never finish the genome
- See this paper for more detail
Bandage
Nodes = contigs Edges = overlaps
- Open up Bandage from your desktop
- Click on
File thenLoad graph - Find the
illumina_assembly_graph.fastg location on your computer and clickOpen
- Click
Draw graph on the left hand panel of the screen - You will see that the Illumina assembly produces a whole mess of 649 contigs with a lot of short contigs connecting everything together
- You can zoom in on the graph to look at it in more detail
- We can colour the graph by depth by changing
Graph display fromRandom colours toColour by depth - You’ll see that the short contigs all have much higher coverage than the other contigs. What do you think that means?
Now we are going to use BLAST to look for the prophage regions in the assembly
- Click on
Create/view BLAST search in the left hand panel of the screen Step 1 click onBuild BLAST database Step 2 click onLoad from FASTA file and find thesakai_prophages.fa location on your computerStep 3 click onRun BLAST search - click on
Close
You will see the prophage regions highlighted in colour on the assembly graph now.
What effect are the prophage regions having on the assembly?
Now, we will load the nanopore assembly into Bandage to see how long reads affect this
- Click on
File thenLoad graph - Find the
O55_combined_miniasm.gfa location on your computer and clickOpen - Click
Draw graph on the left hand panel of the screen - Change
Graph display option fromBLAST hits (solid) toRandom colours . What is strikingly different about this assembly from the Illumina one?
Now, lets look at where the prophage regions are in this assembly
- Click on
Create/view BLAST search in the left hand panel of the screen Step 1 click onBuild BLAST database Step 2 click onLoad from FASTA file and find thesakai_prophages.fa location on your computerStep 3 click onRun BLAST search - Click on
Close
- How have the long reads improved the assembly?
- Are there any regions that haven’t been resolved by the nanopore experiment? Why would that be?