CoreGenomics: May 2012

Wednesday 30 May 2012

Don’t bother with a biopsy; just ask for a drop of blood instead.

I’d like to highlight some research that I have been involved in, which has just been published in Science Translational Medicine; Noninvasive identification and monitoring of cancer mutations by targeted deep sequencing of plasma DNA. The work makes extensive use of Fluidigm’s Access Array and Illumina sequencing, technologies that we have been running in my lab for over two and almost five years respectively. I’m proud of this work and I hope you like it as well.

Liquid biopsies for personalised cancer medicine: Tumours leak DNA into blood and the SciTM paper shows how this circulating tumour DNA (ctDNA) can be used in cancer management as a “liquid biopsy”. The study by Forshew et al demonstrates the feasibility of testing and detecting mutations present at about 2% MAF (Mutant Allele Frequency), in multiple loci, directly from blood plasma in 88 patient samples. The method has been christened Tagged Amplicon sequencing or TAm-seq.

We know that specific mutations can impact treatment, e.g. ErbB2 amplification & Herceptin, and BRAF V600E & Vefurabenib, etc. Many cancers are heterogeneous with metastatic clones differing from each other and/or the primary, and biopsying all tumour locations is unrealistic for most patients. Understanding this heterogeneity in each patient will ultimately help guide personalised cancer medicine.

ctDNA however has not been easy to assay. It is usually fragmented to about 150bp and is present in only a few thousand copies per ml of blood. There has been a recent explosion in neonatal test development since the publications from the Lo and Quake groups. Whilst people were aware of ctDNA for many years it is only similar technological advances that allow us to assay it in such a comprehensive manner.

How does the liquid biopsy work: ctDNA is first extracted from between 0.8 and 2.2ml of blood using the QIAamp Circulating Nucleic Acid kit (Qiagen). Tailed locus-specific primers are used for PCR amplification and the loci targeted in the paper account for 38% of all point-mutations in COSMIC. 88 patient plasma, a couple of controls and 47 FFPE samples were tested in duplicate clearly demonstrating the utility and robustness of their method.

Each sample is pre-amplified in a multiplex PCR reaction to enrich for all targeted loci. This “pre-amp” is used as the template on a Fluidigm Access Array where each locus is individually amplified in 33nl PCR reactions that are recovered from the chip to produce a locus-amplified pool from each sample. A universal PCR adds barcodes and flowcell adapter sequences. 48 samples were pooled for each Illumina GAIIx sequencing run achieving 3200 fold coverage. We recently ran a library with 1536 samples in it from a collaborator using the technology in my lab. The potential of 12288 samples analysed in a single HiSeq run is astonishing.

How sensitive is the liquid biopsy: The paper presents results from a series of experiments to test the sensitivity, false discovery rate (using mixed samples) and validity (using digital Sanger-seq) of the TAm-seq method. ctDNA can be successfully amplified from as little as 0.8ml of plasma, far easier to get than a tissue biopsy! They were able to detect mutations at around 2% MAF with sensitivity and specificity >97%. The paper has some very good figures and I’d encourage you to read it and take a look at those.

Two of the figures stand out for me. The first shows results from one ovarian cancer patient’s follow-up blood samples in which they identified an EGFR mutation not present in the initial biopsy of the ovaries. When they reanalysed the samples collected during initial surgery they could find the EGFR mutation in omental biopsies. A TP53 mutation was found in all samples except white blood cells (normal) and acts as a control (Figure A from the paper).

They presented an analysis of tumour dynamics by tracking 10 mutations discovered from whole tumour genome sequencing, using TAm-seq in the plasma of a single breast cancer patient over a 16 month period of treatment and relapse (Figure D). They also demonstrated of the utility of TAm-seq by comparison to the current ovarian cancer biomarker, CA125 (figures B & C not reproduced here).

What does this mean for clinical cancer genomes: There are many reports of whole genome sequencing leading to clinically interesting findings and some labs have started formal exome and even whole genome sequencing on patients. Whilst there is little doubt that tumour sequencing is likely to be useful in most solid tumours it is still hard to see how this will trickle down to the 100,000 or so new cancer patients we see in the UK. Challenges include cost, bioinformatics, incidental findings and ethics.

The TAm-seq method has fewer of these challenges (although it is not good for detecting amplifications) and I think it is a really big step in clinical cancer genomics and hope it translates quickly into the clinic to inform treatment and prognosis. Perhaps it will be the first technique to make a big splash in personalised cancer genomics?

Hopefully this liquid biopsy will be quickly translated into the clinic.

Perhaps a future version might even turn up in your doctor’s surgery on a MinION in a few years time as a basic screening tool?

Saturday 26 May 2012

The exome is only a small portion of the genome

Ricki Lewis (Genetic linkage blog) wrote a great guest post for Scientific American on what exome sequencing can’t do. It seems timely considering the explosion of interest in exome sequencing and exome arrays. Not so long ago most people I knew still talked about junk DNA, exome sequencing and exome arrays essentially allows users to ignore the junk to get on with real science. As Ricki points out exome analysis is a phenomenally useful tool but users need to understand what they can’t do to get the most from their studies.

Ricki listed 10 things exomes are not so good for, my list is a lot shorter at just 4.

Regulatory sequence is missing (although this is being added, e.g. Illumina).
Not all exons are included.
Structural variants (CNV, InDel, Inv, Trans, etc) are not easily assayed with current exome products.
No two exome products are the same.

Exome analysis has had a real impact, especially on Mendelian diseases that remain undiagnosed. However users need to remember they are only looking at a very small portion of the genome. Ricki puts it this way “the exome, including only exons, is to the genome what a Wikipedia entry about a book is to the actual book”.

I posted a month or so ago about choosing between exome-chip and exome-seq. The explosion in exome-chips has been an even bigger surprise then exome-seq. Illumina admitted that they had been overwhelmed with demand for their array products. It appeared to be pretty clear that exome-seq would take off as soon as the cost came down to something reasonable. However according to Illumina over 1M samples have now been run on exome chips!

Of course analysis of an exome is allowing studies to happen that would never get off the ground if whole genome sequencing were the only option. The cost and relative ease of analysis makes the technology accessible to almost anyone. As the methods and content improve over the next coupe of years this is going to get even easier.

The simplest thing for users to remember is that they are restricting analysis to a subset of the genome. This means that just because you don’t find a variant does not mean one is not lurking outside the exome; absence of evidence is not evidence of absence as statisticians would put it.

It is also helpful to remember that not all exomes are created equal. Commercial products are designed with a price and user in mind. Academic input is usually limited to a few groups and there are always other bits that could be added in. Illumina have done a great job including some of the regulome in their product but the commercial products are in a similar arms race to the one faced by microarray vendors a decade ago. Just because a product targets a bigger exome does not mean it is better for your study.

Exomes are well and truly here to stay. We'll probably see an exome journal soon enough as there is so much interest.

Thursday 24 May 2012

AmpliSeq 2

Ion Torrent released AmpliSeq 2.0 a little while ago. The biggest change is an increase to 3072 amplicons per pool. I saw a Life Tech slide deck which had up to 6144 amplicons pre pool so there looks like more room for improvment.

Who knows when we might see an multiplex PCR exome!

PS: see a previous post about AmpliSeq for more general details.

Tuesday 22 May 2012

Resources for the public understanding of cancer genomics

We have just had an open day at our institute, we had Science Week here in the UK a month ago and I went to my son's school to talk about why people have differently coloured eyes. I like communicating science to other people; at work, in the pub, on the train, at school, etc, etc, etc. I am a science nerd and an happy to be known as one. Most of the time.

Public understanding of science (PUS) is important and we need to make sure the people funding us, and hopefully benefiting from the work we do, realise why we do what we do. However it is not always easy to find the time to put together something everyone will understand and engage with.

There are surprisingly few resources out there to get PUS materials and examples from. This post outlines the work I've done for our open day and discusses some of the resources that are out there to offer some inspiration. I'd also like to enthuse others about the idea of Creative Commons licensing of your PUS materials so others can use them. Then we just need to find somewhere to put them and organise them. Does anyone fancy writing a grant to Wellcome to get some funding for a web-resource?

Why bother with PUS in a genomics core facility: I wanted to share the recent posters I put together to show visitors to our open day what we do in a genomics core facility lab and how those shiny Illumina HiSeq, MiSeq and GAIIx (a little old and not so shiny!) machines can help us help patients.

Mutations in Cancer: The PUS poster comes as a pair which have huge Sanger style sequence traces of the Braf V600E mutation in normal and mutant versions. The visitors are asked to "spot the difference" and identify the mutation, which is a very difficult thing to do by eye. We explain that finding this kind of mutation is what we are doing on our instruments about 1000 times a day. Of course this is a massive over-simplification of the possibilities offered by cancer genome sequencing but hopefully it shows why we spend so much money on cancer genomes. Braf V600E was chosen as it is one of a few mutations that can be tested for to determine treatment. Tests we are developing in out institute will hopefully mean every cancer patient in the UK is screened for mutations like this from a tumour biopsy and, maybe one day, a simple blood sample.

Can you spot the difference between normal (left) and mutant (right) Braf sequences?

How can I tell other scientists about my posters: I wanted to make these posters available to others to use or modify for their own PUS events. The posters can be downloaded here under a creative commons license. However there does not appear to be a central repository of resources like this. A few sites do offer material under creative commons, like the University of Oxford maths department podcasts. I wonder if the resource we need today is somewhere to upload materials with keywords and abstracts in a searchable form. If these were available as shared documents then the community could work on them together. I am thinking something like GoogleDocs or a Wiki for PUS. This would be a wonderful thing for someone like Wellcome or MRC to fund. If your materials became widely used then they could also become something that was worth adding to your CV, demonstrating additional skills outside of research experience.

PUS sites I liked: There are people out there communicating about PUS. There is even a journal called "Public Understanding of Science" that has an interesting editorial blog post about open access publishing. PUS is a subscription journal and they appear to lean away from a PLoS model of open access which seems totally at odds with the journals remit to promote public understanding. How can the scientists better learn to help communicate science if we have to subscribe to a journal to hear about the best ways to do it? PUS covers all kinds of interaction between science and the public, they cover topics such as: "science, scientific and para-scientific belief systems, science in schools, history of science, education of popular science, science and the media."

A post on Diffusion summarises a seminar from martin Rees. He advocates scientists doing more communications work and making sure the public and politicians get access to the very best explanations of science possible.

Marie Boran has some wonderful blog posts at The Strange Quark about public understanding of science. I'd also recommend her posts on what science is, science journalism and what communication means. She communicates things in a way people are likely to remember them. I particularly liked her way of presenting the scale of cells, apparently if you use an M&M to represent a red blood cell then a single grain of sugar would be about the same size as a bacterium.

Lastly I'd like to point everyone to Small Science Zines. This is a site that promotes the use of "Zines", small, easily and cheaply reproduced magazines. They provide instructions (zine-folding directions) to help you produce a simple eight page zine for easy distribution. The zine on DNA computing is an interesting if wordy read. I think I'll give one a go for Illumina sequencing so watch this space. Perhaps we can produce a series of NGS "how to" comics?

The site discusses how to communicate science and how to design zines. It does not all have to be about well honed presentations or laminated A0 posters. PUS could simply be the last time you told someone a science fact. Making a zine to show people is much more personal than pointing them to a website or asking them to visit the lab. All you need is "to know and care about your topic, and want to share this with others".

Wednesday 16 May 2012

HiSeq 2500: how much will the "genome in a day" cost?

The launch of HiSeq 2500 generated a buzz as it came hot on the heels of the Ion Proton. Both instruments will allow users to generate a genome in a day. HiSeq 2500 was launched with a 127Gb in 24 hours spec. Current specs on the Illumina website are at 120Gb in 24 hours, 300M reads and PE150 supported (yielding 180Gb in 39 hours). All this for a $50,000 upgrade fee which makes it seem likely that many users will upgrade at least one instrument.

If you want to know more read the 2500 app note on Ilumina's website, although the Yield figures in the table appear to be incorrect!

There has been much less noise about the likely cost of the data from the rapid run "MiSeq on steroids" format. A recent post on SEQanswers is the first sniff of HiSeq 2500 pricing, although it may not be accurate. It suggests a PE cluster kit will cost $1225 and a 200 cycle SBS kit will be $1690.

I used these figures to get to the possible cost per lane of a HiSeq 2500 run:

PE100 multiplexed: £900 or $1500

This compares incredibly well to the normal output. In fact to me it looks like HiSeq 2500 rapid run mode could be the best choice for core labs like mine as it offers incredible flexibility as a two lane flowcell is quicker to fill up than an 8 lane one. And five dual-flowcell rapid runs will take less time and generate the same data as a dual-8-lane-flowcell standard run. The cost per Gb is going to be a little higher but many users will see this as a fair trade-off for faster turn-around-times.

The HiSeq 2500 rapid runs will also use on-instrument clustering. Exactly how this is going to fit inside the instrument with the available fluidics is not completely clear. I'd expect that we will have to run both positions in the same configuration using the current PE reagent rack.

Whether Illumina are able to really turn HiSeq 2500 into "MiSeq on steroids" and up read lengths to the 687bp presented at AGBT is still to be seen. They might have to if Ion Torrent can push their read-lengths out to current 454 lengths.

The competition: The latest specs from Life suggest that the Proton II chip will generate 20x coverage of a genome (and analyse in a day). However it is not clear if the run time will be longer or multiple chips will be run, current times are 2 hours per chip. A 20x genome in 2 hours would be great, but I don't think we can expect quite that from Life just yet. There is also a video of the first 4 Proton's to be installed (at BCM); "install to sequence in 36 hours" although the video only shows samples being centrifuged before loading and no real sample prep.

What's next: One thing I am happy to predict is that advances in sequencing technology are not going to stop any time soon, and when ONT come out from under their invisibility cloak we might finally get a peek at some data that shows what tomorrow holds.

Wednesday 9 May 2012

NGS is the ultimate QC for your PCR oligos!

About two years ago when we started using Fluidigm Access Array sequencing we noticed something in the reads that was a bit of a surprise, although not totally unexpected once we realised what was going on. We were amplifying all the exons in seven cancer genes across 48 cancer cell lines, sequencing them in a single experiment and finding known SNPs at the expected allelic ratios. However we also found quite a large number of what seemed to be random deletions and truncations in the targeted regions, and these all occurred towards the beginning of the reads.

One of the “problems” with many amplicon sequencing methods is that they often tail locus specific primers with NGS adapters and this means you have to read through the PCR primer before you get to interesting results in your samples. In our case the first 25bp or so of each read was from the primer.

It appeared that we were seeing the incorrect by-products of the oligo production process. Oligo manufacturers use varying methods of clean-up and QC but none are perfect and it looks to me like NGS might be the ultimate, if slightly expensive, oligo QC tool.

One way to test this would be to get the same pair of oligos made by multiple companies, PCR amplify a control DNA template with all of them and then sequence the primer sequence only in a pooled sequencing run. Anyone fancy collaborating?

An oligo “primer”: I thought I would follow up on this with an overview of oligo manufacture and some tips for PCR primer design.

You can buy oligos from many places and a very few labs still make their own. It is possible to get oligos of up to 400bp in length, chimeric oligos (DNA:RNA) and with all sorts of modifications: Fluroescent dyes, Amino-modified, Biotinylated, Phosphorylated, 2'-Deoxyinosine (INO) 5-Methyl-dC, Phosphorothioate, dI, dU, 2'-Deoxyuridine (URI), Amino C6-dT, Spacers (Deoxyabasic or C3), Thiols, etc, etc, etc. Most people are simply out to buy a primer for PCR or Sanger sequencing but there are also RNA, siRNAs, PNA, LNA and many other types available for many applications.

Choosing a provider most often comes down to cost and the price per base is very low for a standard desalted PCR primer. However the options offered by oligo manufacturers are numerous and some might be a prerequisite for your experiment.

Most standard oligos are supplied already quantified and even at your pre-determined concentration. The amount of oligo you actually get is dependent on the synthesis scale (how much is made), the efficiency of each coupling and the purification used. The lowest synthesis scale is generally fine for PCR applications, however if you want specific cleanup of your oligo you may have to get a larger scale synthesised. Most providers suggest that you resuspend oligos in TE rather than water, which can be slightly acidic. I have always used a buffer with a lower amount of EDTA (10 mM Tris, pH 8.0, 0.1 mM EDTA) as this can inhibit some molecular biology applications at higher concentrations.

Oligo QC: The current standard for QC is mass spectrometry (MALDI-TOF or ESI-MS), or gel electrophoresis (CE or PAGE). MALDI-TOF is used for most applications because of its high throughput, but ESI is better for oligos >50bp.

Oligo Yields: Most companies will specify yields that seem incredibly low compared to the starting nucleotide concentrations. The oligo synthesis is performed sequentially with a single nucleotide being coupled to the growing oligo in a 3’ to 5’ direction. This coupling is often less than 99% efficient so some of the oligos are not extended. This means the final product is a mix of full-length product (n) and truncated sequences (n-1, etc).

Oligo purification can be performed in many ways, cartridge, HPLC, PAGE. On the Sigma website there is a very handy table showing which clean-up you should choose for different applications.

Sigma's clean-up guide

Desalting: This is the most common and cheapest clean-up method and is perfectly fine for standard PCR based applications. Desalting removes by-products from the oligo manufacturing process. If your oligos are >35bp then desalting will not remove the relatively high number of n-1 and other truncated oligos.
Cartridge: A reverse-phase cartridge purification by Sigma will give about 80-90% yield. Full length oligos contain a 5'-DMT and are separated from truncated sequences by making use of their higher hydrophobicitiy. However this difference is reduced as oligos increase in length and should not be used for anything >50bp.
HPLC: A reverse-phase HPLC purification allows higher resolution and gives higher yields of >90%. HPLC also allows purification of larger amounts of oligo (>1 umol). Again this method is not ideal if oligos are more than 50bp.
PAGE: A poly-acrylamide gel electrophoresis can achieve single-base resolution and very high-quality purification and is highly recommended for longer oligos, >50bp. Unfortunately the yield after gel extraction can be quite low.

Primer design tips: Oilgos for PCR, qPCR and sequencing are pretty easy to design if you follow a few simple rules. Once you have primers designed it pays to use the Blat and in silico PCR tools on the UCSC genome browser. Order the lowest synthesis scale for basic PCR applications and resuspend oligos in low EDTA TE (10 mM Tris, pH 8.0, 0.1 mM EDTA). If you are in any doubt about contamination, throw oligos away and buy new ones!

Use Primer 3: Primer 3 is the tool many others are based on, forget all the other rules just use it! But in the spirit of educating readers here are the other rules...
Amplicon length: Standard PCR is fine up to 1-3kb, over this then the primer design may not need to change but the reaction conditions almost certainly will. For qPCR keep amplicons around 100-200bp.
Oligo length: 18-22bp is optimal and should allow high specificity and good annealing characteristics.Longer tehn 3-0bp and you can affect teh annealing time reducing reaction efficiency.
Melting Temperature: A Tm of 52-58C works best in most applications, make sure primer pairs are as similar to each other as possible and test using a gradient cycler run.
GC Content: The GC content (the number of G's and C's in the primer as a percentage of the total bases) of primer should be 40-60%.
Avoid secondary structure: If your primer develops secondary structures you will get low or even no yield of the final product. 2’ structure affects primer template annealing and thus the amplification. Internal hairpins and primer dimers are common problems.
Avoid repeats and runs: Repeats (e.g. di-nucleotides) and mono-nucleotide runs can easily misprime (and promote 2’ structure), try to avoid them or keep to fewer than 4-6bp.
Don’t design to homologous regions: Sounds obvious but if your primer can anneal to multiple regions in the genome then you are likely to get multiple products in your reaction. Blast or Blat your final sequences before ordering.

Pages