Monday, 31 October 2011

My genome analysis part III - the results are in

Lots of excitement today on checking to see if my sample has been processed, the results are ready.

You can see I checked the "please load my health data", I guess 23andMe want to make sure you really really do want to find things out about yourself efore letting you dive into the results.
The next step is to enter my data; year of birth (40 years ago last Tuesday), sex (with an "I'm not sure" option!), height, weight and smoking status. I answered no to all the medical questions (lucky me), except I am a Psoriatic so to finish off I added that I am using Dovonex cream for my Psoriasis.

Health and disease status: My first result is my blood group status, 23adnMe correctly identify me as having an O blood group. I also find out which snps are used to determine this and the reference for the paper. Interesting stuff for a scientist like me.

Immediately available is my health status report. This shows 114 disease, 52 trait, 27 carrier status and 20 drug response reports.

3 of these are locked. The locked reports describe the trait or disease eing reported and exp,lain the genetic susceptibility BEFORE you chose to reveal results. It certainly felt that the website is designed to guide an informed choice, even if you are missing a face-to-face discussion with a genetic counselor. I looked at my ApoE status and see I have twice the risk of the general population. Certainly not a 'nice' result but not likely to make me lose any sleep.

The first 23andMe discovery I read about was curly hair. According to the analysis I should have slightly curlier hair than the average European. Mine is dead straight and was so even when I was a head-banging teen rocker. My brother has slightly curly hair and my dad was curly (all shaved off now).

I am not a carrier for 24 of the diseases reported. I am a haemochromotosis carrier and not ready to look at BRCA status yet.

The most useful result for me personally is an increased risk of Glaucoma. This had been mentioned at my last opticians visit and I had brushed it off a it. Seeing a genetic risk as well makes me think I will speak to the optician a bit more at my next appointment and monitor this closely. I'll also start to look at what I can do and what treatments are available for this condition.

Traits and inheritance: As for traits I was happy to see I am a likely sprinter (CRUK half marathon in March next year). I was a little disappointed to find out I am likely to have a typical IQ. It shows what ard work I must have put in to get where I am today and also says to me my kids won't be able to get away with not doing their homework.

There are no closely related individuals on 23andME, yet. I am 74.24% identical to Neil Hadfield (also on 23andME) however I am 71.19% similar to a Chinese person and 68.49% similar to an African. Neil is probably not my long lost brother.

Impressions so far: I will spend a few days looking through this but so far it has persuaded me my £160 birthday present was worth it. It certainly satisfies my curiosity. For now I will leave BRCA status as there are some family things that need to be discussed efofre diving into that one.

Friday, 28 October 2011

Top ten things to write in a leaving card

I always struggle when it comes to writing in someones leaving card. I'd like to be witty and at the same time show the person that I remembered them and enjoyed working with them. However most of the time something short gets quickly scribbled.

We get a lot of leaving cards in scientific centres. I guess this is due to the contractual nature of much of the work we do, PhD students and Post-docs are all on three to five year contracts. Many grants only carry money for three to five years or even less. This means people do move on quite regularly.

How much to donate to the gift?
Every time a card comes round I also think about the collection. I try hard for my staff to go round the people they have most likely interacted with and worked for and make a shameless effort to get enough in the way of contributions to buy something decent. In my view if everyone gave £1 or £2 then we should get about £100 which buys a nice gift to remember the lab/institute by.

The top ten things written in leaving cards: here is a summary of three leaving cards, not perhaps a large enough sample to say anthing with statistical confidence (but that's a whole other post about stats)
1: Best wishes 25%.
2: Good luck 15%.
3: Best wishes and good luck combination, 10%.
4: Congratulations 5%.
5: All the best <5%.
6: I will miss you <5%.
7: Thanks for your help <5%.
8: Goodbye <5%.
9: Enjoy the new job <5%.
10: Sorry you are leaving <5%.

I think 50% of 'best wishes' and/or 'good luck' shows just how unimaginitive we are so I'd like to encourage everyone reading this to write something far more interesting in the next card that lands on the bench in front of you.

Why doesn't anyone write "I've always loved you" and sign it from a secret admirer?

And don't forget to add a couple of quid, dollars, yen, etc.

Monday, 24 October 2011

My genome analysis part II

My kit arrived, I spat and it is now being processed. It is a pretty neat package with easy to follow instructions. Inside the package was a box containing the kit, which is an Oragene product. Has anyone tested their RNA kits yet?

Before sending it back you have to register the kit on the 23andMe website and link it to an account. The only trouble I had was not being able to enter a UK postcode.

I am now connected to the only other Hadfield on their database and it would be nice if I turned out to be more related to him than anyone else. Lets wait and see.

Consent agreement:
There is a rather lengthy consent document you need to read and sign. This gives 23andMe access to your personal genotype and other non-identifying data for research use in the 23andWe program. As a genomic scientist I am more than happy to do this, large sample sizes are clearly needed for this kind of study. It is a shame that 23andMe don't share the IP with the users. This would be a great way to connect individuals with scientific research. Lets face it the proceeds would be minimal but if they offered a charity donation option then someone other than just 23andMe might benefit.

The lengthy consent agreement is primarily aimed at making sure I can give informed consent to use my data. Surprisingly to me this is the first time I have ever given informed consent.

23andWe projects:
23andWe is running research projects to "understand the basic causes of disease, develop drugs or other treatments and/or preventive measures, or predict a person's risk of disease". The listed projects cover a wide range of from hair colour & freckles to migraine to Parkinson's. They specifically say that they will not investigate "sensitive" topics such as sexual orientaion (although they would need to be analysing methylation for Epigaynomics) or drug use. But if they do decide to do so in the future they would contact me and ask for a separate consent agreement. They will also collaborate with external groups but won't release any identifying information. There is of course the worry that 1M genotypes pretty well identifies me to some people.

There is a great website over at the University of Delaware from John H. McDonald, "Myths of Human Genetics". I am sure there are lots of other interesting questions that the public might engage with. Some of these may not be considered high-brow science but f it gets people involved and they consent for other studies surely that has to be a good thing?

Personally I hope 23andMe do some Psoriasis research. I am a Psoriatic and would like to think about some analysis that might be made using 23andMe data, maybe they will even let users start studies one day?

I am now waiting for the dat to turn up and then I can take a look at what is lurking in my genome. Fingers crossed it has some good news stories to post about!

Tuesday, 18 October 2011

Life Technologies turn up the heat on Illumina: do we have some real competition?

I was presenting today at an EBI workshop and was followed by Matt Dyer, senior product manager bioinformatics at Ion Torrent. He gave a good introduction to the platform an recent development and then went onto a hands on demo of the Torrent suite of applications.

He updated his talk 5 minutes before giving it.
571.43Mbp from a single 316 chip.
551 bp perfect read is current record for length.

Goodbye 454?

Life Tech made a real splash by the sound of things coming out of ASHG.

5500 updates:
Wildfire is the new isothermal amplification removing the painful emulsion PCR, and will be performed on the instrument using the 5500 "flowchips". Sounds a bit like a cBot in a HiSeq!

Certainly does according to Mark Gardner, VP at Life Tech, over on GenomeWeb who says "you coat the flow cell with primers ... and then add template and isothermally amplify." resulting in isolated fragments. Sounds an awful lot like clusters on a flowcell.

Gardner also suggested the much hyped feature of using just one lane on a flowcell. Lastly Gardner says the 55000 will move from 400k cpmm (400 thousand clusters per millimetre squared) to 1M cpmm and on both sides of the flowchip.

If all this pans out then there is a real competitor to Illumina for whole genome sequencing and any high-throughput applications.

Incidentally, I found it really hard to find a picture of a flowchip, can someone post some HiRes images? Or post me a flowchip and I'll put it alongside Illumina, PacBio, Ion et al.

5500 Flowchip from

Ion updates:

A new library prep kit the Ion Xpress™ Plus Fragment Library Kit an enzymatic shearing reduces library prep to 2 hours making DNA to sequence possible in eight hours. 200bp read kits from the Ion Sequencing 200 Kit and talk of reads over 500bp. Custom enrichment for more then 100kb of genome targeting, Ion TargetSeq™ Custom Enrichment Kit. 384 barcodes are coming as well.

Look out Roche and Illumina, Ion is hot on your heels.

PS: I still think of Life Tech 5500 as SOLiD and use this in conversations with others. I'll miss the term when Life Tech marketing finally kill it.

Amplicon NGS battles begin in earnest

A short while ago I posted about the recent exome sequencingcomparisons in Genome Research. In that post I did ask whether we really need to target the exome at all and if targeted amplicon sequencing might be a better fit for some projects.

In the last few days both Life Technologies and Illumina have released amplicon resequencing products. You can read another good review of Illumina's offering over at Keith Robinson's blog.

I really hope that amplicon NGS is the tool that gets translated into the clinic quickly. Microarrays took over a decade, and only CGH has made it. I am not aware of any gene expression array based clinical tests, either than Mammaprint and the upcoming Coloprint from Agendia. Amplicon NGS is similar to the current standard Sanger tests in many ways. Labs will still perform PCR and sequencing, they'll just be doing a different PCR and it will be NGS. This should make adoption seem like less of a hurdle.

The other amplicon competition:

Fluidigm's Access Array, RainDance's new ThunderStorm, Halo Genomics, MIPs, traditional multiplex PCR assays are all competition from the in-house kits of Illumina and Life Technologies. The major differences with all the platforms are the way in which multiple loci are captured and amplified. Microfluidics, emulsion PCR and oligo-probes are the different 'capture' mechanisms. All rely on PCR for the amplification and to add the sequencing platform adapters and barcodes. The cost of the RainDance instrument is very high, AccessArray is medium and the probe based systems can require almost nothing additional over what is already in your wet lab. AccessArray is the only system where the user has complete flexibility over what goes in the panel, if you want to change something just order a new pair of primers. RainDance, Halo and other platforms, as well as Life and Illumina's offerings, all require you to design a panel and order quite a lot to become cost effective.

Ultimately the cost per sample is going to be what makes one of the system here, or one as yet to be released the dominant technology. $10 rather than $100 is what we need to get these tests to every cancer patient!

So what have Life Tech and Illumina got to offer?

Life Technologies "AmpliSeq" amplicon sequencing cancer panel for Ion Torrent:

The Ion cancer panel interrogates >700 mutations using 190 amplicons in 46 genes. Using the 314 chip should get 500 fold coverage and allow detection of variants as low as 5%. The AmpliSeq kit can target 480 amplicons (but is scalable from there) in a single tube reaction with just 10ng DNA input from FFZN or FFPE tissue. PCR and sequencing can be completed in a single day, assuming of course you have the one touch system. They have chosen "the most relevant cancer genes" for the initial panel, most probably from COSMIC.

Life Tech are also involved in the CRUK/TSB funded Stratified Medicines Initiative, on which I was worked early on. However I am not sure if they are going to get the ion test out before a full set of Sanger based assays. It will be interesting to see what comes first on this project and could be a good proxy for seeing how much Life Tech still believes in Sanger as a long-term product. Life Tech are aiming to get this into the clinic and are going to seek FDA approval.

There is no pricing on the press release from Life Tech.

I'd agree with the early access Life Tech customers, Christopher Corless, Marjolijn Ligtenberg and Pierre Laurent-Puig at Oregon, Nijmegen and Paris respectively on the likely benefits of amplicon NGS. The simplicity of these methods will hopefully mean clinical genetics labs adopt them quickly.

Illumina's TruSeq custom amplicon (TCSA) sequencing for MiSeq et al:

Illumina provide a nice tool in the DesignStudio and also recently release a cloud based analysis system called BaseSpace. Both of these are likely to help novices get results quickly. TCSA allows you to target up to 384 amplicons with 96 indices and requires 250ng of input DNA. Illumina use an integrated normalisation solution so you do not have to quantitate each amplicon set before running on a sequencer. This is going to make some peoples lives much easier as many do still struggle getting this right every time.

TCSA uses the GoldenGate chemistry as I mentioned at the bottom of a previous post. This makes use of an extension:ligation (see here for one of the origianla E:L methods) reaction followed by universal PCR to provide better specificity in highly multiplex PCR based reactions. In SNP genotyping GG goes much higher than the 384 plex Illumina are offering on TCSA today. Hopefully this shows the scope for increasing the level of multiplexing.

The benefits of running TCSA on MiSeq are going to be turnaround time and the inbuilt analysis workflows. Of course many users will want to be amplifying 100s-10,000's of samples from FFPE collections and for this purpose Illumina might want to consider modifying their dual-indexing to allow the maximal number of samples to be run in a single HiSeq lane. Right now the limitation of 96 is a pain.

There are no early access customer comments on the TCSA data sheet or Illumina's website. I cant imagine it is going to take too long for the first reports to come out on how well it works though.

Illumina have a pricing calculator on their website so you get to see how much your project is going to cost. Once you have designed an amplicon pool it will let you specify a number of samples and return a project cost inclusive of MiSeq seqeuncing. I'm not sure who they talked to about the price point for this but it looks like Illumina are aiming at capillary users. The target price is $0.43 per amplicon or nearly $200 per sample! Personally I was hoping we would get to under $50 per sample and as low as $10 or $20. I'd also like to see enough indices such that a large project could be run in one lane on HiSeq making the whole project very cost effective and fast.

Imagine 1500 DNA samples from FFPE blocks for lung cancer being screened for the top 50 Cancer genes with just 15 plates of PCR and one PE100 lane on HiSeq. The whole sample prep could be done by one person in a couple of weeks, and the sequencing completed ten days later.

Watch out for more competition:
It feels like everyone sees amplicon sequencing (Amp-seq anyone?) as the most likely step into the clinic. As such there is going to be stiff competition in this format and that can only be good for all of us wanting to use the technology.

Hopefully it won't be too long before someone compares the results on all of these as well.

Saturday, 15 October 2011

Learing to live with staff turnover: my Top 10 tips for recruitment

There are six people working in the Genomics Core lab I run, including myself. It is exactly five years since I started and in that time five members of my team have come and gone. The next person to go will mean I hit a milestone I had not even thought about before, where there are as many people working for me as there are people who have worked for me in the past!

Staff leaving is never easy. There is a lot of work to be done in recruiting someone new and getting them up to speed in the lab. Also there is an inevitable impact on the rest of the lab as a vacuum is created and someone new has to come in and fill another persons shoes.

However it is not all bad. Leavers offer an opportunity to change how things are done and can mean promotion of others in the lab. Even if this does not happen inevitably the ripple effects mean people get to do some new things and take on new responsibilities.

I recently had my number two person in the lab leave. She has been great and has worked for me for over four years, and will be sorely missed. However it was time for her to move on, a great opportunity arose and she went for it and I wish her the very best. I had to recruit and thought I'd post on my experiences for others to consider.

My Top 10 tips for recruitment

1 Write a good job description: It might sound like an obvious one but get it wrong and you'll never get the right person. This is the time to really consider what you need this new person to do in the lab. It is an opportunity to change responsibilities as someone new can take on something the other person never did without even knowing it.

2 Write a good advert: I always struggle with this. How to get a good ad that attracts people to apply is tough. I always get our HR tam to help with this and usually aim for online advertising now. The costs of ads in Science and Nature is very high in the print editions. Online is no cheap option though, however the ad needs to be seen.

3 Read covering letters and CVs carefully: For my last job opening I got over 50 applicants. There is not enough time to read every one in detail and fortunately our HR team use an online system that allows me to screen and reject poor candidates quite easily. I usually start with the covering letter and if this does not grab me put the candidate straight into the reject pile. It might seem tough but the covering letter is the opportunity for the applicant to shine and to shout out why they are the best person for this job. The CV should be clear and allow me to see what skills they have and what their job experience is. A list of 40 publications is a bad idea and off putting. personally I like to see no more than three.

4 Use a scoring matrix for possible candidates: I start by deciding which criteria are most important for teh job, perhaps specific skills. I then make a table in Excel to record how each candidate measures up on a three point scoring system. I use the results of this to decide which candidates to invite in for interview and also use it to decide on the order of interviews. I like to get the best candidate in first and then see the others in order of preference if possible. It can get tiring doing interviews so I want to be as fresh as possible for the best candidates. This matrix also helps if someone comes back later to ask why they did not get the job as there is evidence they might not have measured up against other candidates.

5 Generate a list of questions for the interview: These do not have to be kept to rigidly but they offer an opportunity to keep interviews as similar as possible so you can make an unbiased decision. They also allow you to think of something to ask if a candidate turn out to be very poor. I would not recommend you stick to an hours interview if it really is not going anywhere, get the candidate out of the door and move on.

6 Get candidates to present: I have found a ten minute presentation a great way to start off an interview. I use a rather vague title for talks like "Cancer genomics in a core facility" and allow candidates freedom to interpret this a they see best. This certainly sorts out people who are really thinking how a core might run from post-docs that would really just like another research post. A talk also gives you an idea of how the person will communicate with others in the job. And it shows you how much homework they have done for this job.

7 Show candidates the lab: I ask people in my team or collaborating labs to take people around and then get their feedback on the candidates as well. Sometimes people relax in this scenario and their true personality comes out. If someone seems interested in the Institute and the work we are doing then great. If all they care about is the holiday package and any perks they are unlikely to make this clear in the formal interview.

8 Talk to the interview panel: I get mine to rate the top three candidates in order of preference. Having each person do this independently can help when there is a difficult choice. If the three don't agree than you can have an informed discussion as to why. Of course hopefully it is clear and the same candidate comes out on top.

9 Make a good offer: I like to personally call someone when offering a job in my lab. It is one of the nicest things about being a boss and I hope makes a better impression on the individual than having HR ring them up. Personally though I leave pay and conditions to HR, I just stick to questions about the job. Leaving HR to the complex discussions on pay is helpful. They won't get carried away with  packages and can answer all the questions individuals might have.

10 Help them settle in: When someone new turns up make sure you give them the time to settle in, explain the job again, introduce them to everyone again, take them on another tour. I like to sit down on the first afternoon and have an informal chat about what I want them to do and where I think the lab is going. Give people some of your time.

Hopefully everything goes well and the new person settles in fine. I am excited about my latest recruit and hope your next recruitment goes smoothly.

Illumina and other life science stock slipping fast

Illumina's stock has dropped dramatically in the last few months.

I have watched Illumina over the last six or seven years and their price has seemed to follow a continual upward trajectory. With the exception of a couple of hiccups, one of which was caused by the Paired End reagent problems in 2009. This time it looks like the global recession is finally starting to hit Genomics expenditure.

And just as we were all having so much fun!

The stock had been at $60-80 for most of the last year. But in July it dipped below $60 and by mid September was under $50.

Today it stands at $27.

Several of the investment companies have recently downgraded life science companies including Illumina as sales forecasts are not looking as stellar as they have in recent years. I am sure Illumina and Life are placing big bets on MiSeq and Ion. Illumina are only just starting to ship and if they can't deliver in the volumes expected I think the banks won't take that as a good sign. At least the PE issues were against a background of incredibly strong demand from users.

Life Technology has had a similar drop from $70 and is currently at $36.82.

Thursday, 13 October 2011

The 'embarrasing' science of Olympic drug testing in the UK

GlaxoSmithKline have won the contract to help perform drug testing at next summers 2012 Olympic games with King's College London's Drug Control Centre. There is a report on the BBC news from the 10th of October which is still being repeated. It is, quite frankly, embarrassing.

It shows a lab similar to the one that will be used for the actual testing. The news team focused on the robots and show an automated pipetting robot that could be used to make sure atheletes don't cheat. Watch the video carefully and at 1 min 39 sec the robot does its thing, 96 pipet tips come down into the 96 well plate and liquid comes flooding out all over the deck of the robot.


I am sure Professor David Cowan, King's College London's Drug Control Centre Director would prefer Gold rather than a wooden spoon!

Perhaps he should employ Gabriel See, the 11 year old who was one of the team that built a liquid handling robot out of lego. His worked about as well as the GSK one.

Sorry for those of you outside of the UK you might not be able to watch this.

Friday, 7 October 2011

Illumina Custom Capture: Design Studio review

Illumina are currently offering a demo kit for custom TruSeq capture. I thought I would try it out on some genes from COSMIC and see how easy it was to design a capture set using their DesignStudio tool. There is also a pricing calculator I was very interested in so we can see how much the final product is likely to cost.

The TruSeq Custom capture kit is an in-solution method that allows users to target 0.7-15 Mb of sequence. The design studio site will produce 2,500-67,000 custom oligos. After ordering you simply make lots of libraries with TruSeq DNA Sample Preparation Kits and perform the capture reactions in up to 12plex pool. This makes the process pretty efficient if you have lots of samples to screen. As the kits come in 24 or 96 reaction sizes a total of nearly 300-1200 samples can be processed together. With 24 indexes currently possible in TruSeq DNA kits this is just 2-4 lanes of sequencing. As Illumina move to 96 indexes for TruSeq kits the sequencing cost will continue to drop.

You need to register with Illumina for an iCom account, then you can just log-in to the "Design Studio" and get started.

Illumina DesignStudio: Start a project, choose the genome, upload loci and the tool does its job.

There are multiple ways to get your genes of interest into their database. I chose to upload a csv file with a list of the 100 most mutated genes in COSMIC. The template file Illumina provide is very minimal. There are columns for, gene name, offset bases, target type (exon or whole gene), density (standard or dense) and a user definable label. The upload was simple enough and in about thirty seconds all the genomic coordinates for the list of genes was available. The processing for bait design took a little longer at about ten minutes.

The design tool predicts coverage and gives a quality score (this is a cumulative score for the entire region targeted) for the targeting. Each probe set is shown in a browser and coloured green for OK or yellow for problematic. For my 100 genes 12 were under a 90% score and two were not designed at all because I entered names with additional characters making them incomprehensible by the tool.

Here is a screenshot of TP53 exon probes:

there is a pricing calculator available as well which I'll talk about in a minute.

My "100 most mutated COSMIC genes" custom capture kit:
It took about twenty minutes to pull this together
Regions of Interest Targeted: 2662
Final Attempted Probes Selected: 3693
Number of Gaps: 80
Total Gap Distance: 3,158
Non-redundant Design Footprint: 736,759
Design Redundancy: 3%
Percent Coverage: 100%
Estimated Success: ≥ 95%

How much will this kit cost?
The pricing calculator has some variable fields you need to fill in, it uses the data from my region list asks for library size (default 400), how many samples you want to run (288 minimum), what level of multiplexing (12 plex for this example) and then platform and read type. Lastly it asks you to select the % of bases covered in the regions and at what fold coverage, e.g. 95% of bases at 10 fold (in this example).

Unfortunately the calculator did not work!

Fortunately Illumina provide another one here and this did. This also recommends how many flowcells and what other items are needed to run your custom capture project.

It turns out that I will need:
My TruSeq Custom Enrichment Kit at $33,845.76 in this example.
6x TruSeq DNA Sample Prep Kit v2-Set A (48rxn with PCR) at $12k.
3x TruSeq SBS Kit v3 - HS (200-cycles) at $17k.
3x TruSeq PE Cluster Kit v3 - cBot - HS at $13k.
A total of $76000k or $263 per sample. 

Trying it out:
If you are interested in trialling this then Illumina are offering a 50% off promo on a 5000 oligo capture kit for up to 288 samples (24 pull downs at 12 plex). You can also order a $2860 TruSeq custom capture demo kit that targets to ~400 cancer genes (again from COSMIC I expect, there are also autoimmume and ES cell gene kits). The kit includes both TruSeq DNA library prep and the custom capture reagents for processing 48 samples. This works out at $60 per sample for pulldown. If this were run on two lanes of HiSeq v3 the cost including sequencing would still be under $100 per sample.

I am not sure how much sequence is going to be needed to get good coverage on this particular kit, but my kt is a little smaller than the Cancer trial kit so the results from the pricing calculator should be indicative. 

When will custom amplicon be released?
For me the most interesting thing Illumina mentioned in the MiSeq release at AGBT was the GoldenGate based custom amplicon product. Hopefully it will appear soon on DesignStudio as well. This will allow us to remove library prep almost entirely and process samples from much smaller amounts of DNA.

Illumina are also releasing dual barcoding in the next release of HCS. This will allow four reads from each library molecule so with just 24 barcodes you may be able to multiplex 576 samples into one MiSeq run with just six 96 well plate reactions to get almost ten fold coverage of all TP53 exons.

Thursday, 6 October 2011

Exome capture comparison publication splurge

In the last few days there has been a mini splurge in papers reviewing capture technologies. I thought it would be useful to write an overview of these. I have been involved in several comparison papers and so am aware of the limitations of comparison experiments. Many comparison publications fail to give readers the one answer they are looking for, a clear "which is better" statement. In fact the Sulonen paper discussed below says "the question often asked from a sequencing core laboratory thus is: “Which exome capture method should I use?”" They do appear to skirt the issue though in their final conclusions.

GenomeWeb has reviewed and interviewed Michael Snyder’s NatureBiotechnology paper. They pointed out many of the highlights from the paper.

Exome-Seq via genome capture protocols either on arrays or in-solution have been making quite a splash with many hundreds or even thousands of samples being published. In-solution methods seem to have won out, which is not surprising given the throughput requirements of researchers. And Exome-Seq has become pretty common in many labs, allowing an interesting portion of the genome to be interrogated more cost effectively than whole genome sequencing (WGS) would allow. Even a $1000 genome might not supplant Exome-Seq as the number of samples that can be run is significantly higher and this is likely to give many projects the power to discover significant variants, albeit only in the targeted regions of course.

Exome-Seq kits vary significantly in the in the regions and amount of genome captured. Unfortunately the latest kits from each provider are not easily available on UCSC for comparison as annotation tracks. If you have a specific set of genes you are interested in you need to go out and find this information yourself. Both Agilent and Illumina require you to register on their websites to download tracks. Nimblegen's are available from their website.

Table 1: Sulonen et al produce a comprehensive table comparing Agilent to Nimblegen and this has helped with some of the detail in my table below. I have chosen not to include details on what is captured as this is still changing quite rapidly. I have instead focussed on factors likely to impact projects such as input DNA requirements, pooling and time.

Table 1

Parla et al: A comparative analysis of exome capture. Genome Biol. 2011 Sep 29;12(9):R97.
Dick McCombie's lab at Cold Spring harbour compared two exome capture kits and found that both kits performed well. These focussed on CCDS capture and as such did not capture everything researchers may be interested in. Both were sequenced on the Illumina platform.

Asan et al: Comprehensive comparison of three commercial human whole-exome capture platforms. Genome Biol. 2011 Sep 28;12(9):R95.
This group compared one array based kit with two in-solution versions; Nimblegen's array and in-solution kits and Agilent's SureSelect kit. They used the first Asian DNA sample published in Nature 2008. They also reported the differences in regions captured by these kits, this is mainly a design decision and there is no reason I am aware of that would not allow each company to target exactly the same regions. Asan et al found that Nimblegen produced better uniformity at 30-100 fold coverage. All platforms called SNPs equally well. They compared SNP calls from Illumina 1M chips to array based genotyping and reported >99% concordance between sequencing and arrays. They also discuss the advantages of in-solution methods over array based ones. HiSeq PE90bp, data submitted to SRA.

Sulonen et al: Comparison of solution-based exome capture methods for next generation sequencing. Genome Biol. 2011 Sep 28;12(9):R94.
Sulonen et al used a single control DNA across two kits each from Agilent and Nimblegen. They found that Nimblegen generated lower amounts of off-target sequence and showed more specific targeting and enrichment for Nimblegen than Agilent. The Nimblegen kit was most efficient and captured the exome targeted with just 20 fold coverage. Agilent produced fewer duplicate reads. The 201 Genome Biology paper by Bainbridge et al discussed duplicate reads, their suggestion being that these come from low complexity libraries. They also stated that these can be difficult to screen out. We have been looking at library indexing approaches that could incorporate a random sequence in the index read. This would allow PCR duplicates to be removed quite easily. They again reported the negative impact of GC content on capture and said the long baits on the Agilent platform appeared to be slightly more impacted by this. Interestingly they reported that where a SNP was heterozygous more reference alleles were called than would have been expected and explained this as a result of the capture probes being designed to the reference allele. However the genotype concordance of sequencing to arrays, this time on Illumina 660W Quad chips, was >99% from a coverage of just 11 fold. The authors don't do a great job of saying what they did in the lab. They report sequencing reads of 60-100bp but in the sequencing methods don't say whether this is single or paired end nor what instrument or chemistry was used. They did submit their data to SRA though.

Clark et al: Performance comparison of exome DNA sequencing technologies. Nat Biotechnol. 2011 Sep 25. doi: 10.1038/nbt.1975
Michael Snyder's lab at Stanford compared exome capture kits from Agilent, Illumina and Nimblegen using the same human sample. They found that Nimblegen covered the fewest regions but required the lowest amount of whilst Agilent and Illumina covered more regions but needed higher sequence coverage. Additionally Illumina captured non-coding regions and regulatory sequence not targeted by the other platforms, this is going to be a key development for some researchers. Lastly this group compared the exome data to whole genome sequencing of the same samples and interestingly found that Exome-Seq discovered additional small variants missed by WGS.

Some interesting stats from the sequencing data include: off target reads of one third for Illumina compared to 13% and 9% for Agilent and Nimblegen respectively. Illumina did respond to this in a GenomeWeb article stating that their new TruSeq kits reduced duplication rates to generate far better results. Genomic loci high in GC were less well targeted and Agilent performed best in the regions where data could be compared. Illumina captured most SNPs but targeted the most sequence so no real surprise there. Where the three platforms overlapped Nimblegen was most efficient. HiSeq

Natsoulis, G. et al., 2011. A Flexible Approach for Highly Multiplexed Candidate Gene Targeted Resequencing. PloS one, 6(6).

I wanted to include this PloS One paper as the group took quite a different approach, which may well be a very useful one for other groups. Instead of purchasing a whole exome capture kit Natsoulis et al designed 100bp olio’s ads baits to the Human exome and published these as an open resource. Now anyone can order the baits they are interested in and perform custom capture in their own lab.

How much does Exome-Seq cost?

Snyder’s paper included some comments on the costs of capture that they said was "highly negotiable". The biggest change coming is in the pooling strategies with all platforms moving to six or eight plex pooling before capture and Illumina's custom capture kits now supporting a 12 plex reaction. This makes the workflow much easier for large numbers of samples. I have been following this for a while and costs are dropping so rapidly as to make comparison or projection a bit of a waste of time. The number of sequencing lanes required is also changing as the density on Illumina continues to rise. Illumina handily provide a table estimating the number of exomes that can be run per lane on HiSeq and other platforms, HiSeq 600gb v3 chemistry allows 7 exomes per lane or 56 per flowcell at 50x coverage. And an exome might be achievable on a MiSeq next year. I have estimated out internal exome costs to be about £300-450 depending on coverage and read type, inclusive of library prep, capture and sequencing. We are only just starting to run these in my lab though so I'll soon find out how close my estimates really are.

Do you really need to target the exome?

A lot of people I have talked to are now looking at screening pipelines which use Exome-Seq ahead of WGS to reduce the number of whole Human genomes to be sequenced. The idea being that the exome run will find mutations that can be followed up in many cases and only those with no hits can be selected for WGS.

As a Cancer Genomics Core Facility I am also wondering how the smaller targeted panels like Illumina's demo TruSeq Custom Capture Kit: Cancer Panel will fit into this screening regime. These can be multiplexed to higher level, target many fewer regions but cost a lot less to sequence and analyse. Perhaps the start of the process should be this or even next-gen PCR based 'capture' from the likes of Fluidigm?


1. Parla et al: A comparative analysis of exome capture. Genome Biol. 2011 Sep 29;12(9):R97.

2. Asan et al: Comprehensive comparison of three commercial human whole-exome capture platforms. Genome Biol. 2011 Sep 28;12(9):R95.

3. Sulonen et al: Comparison of solution-based exome capture methods for next generation sequencing. Genome Biol. 2011 Sep 28;12(9):R94.

4. Clark et al: Performance comparison of exome DNA sequencing technologies. Nat Biotechnol. 2011 Sep 25. doi: 10.1038/nbt.1975

5. Natsoulis, G. et al., 2011. A Flexible Approach for Highly Multiplexed Candidate Gene Targeted Resequencing. PloS one, 6(6).

6. Bainbridge, M.N. et al., 2010. Whole exome capture in solution with 3Gbp of data. Genome Biology.

7. Kingsmore, S.F. & Saunders, C.J., 2011. Deep Sequencing of Patient Genomes for Disease Diagnosis : When Will It Become Routine? ScienceTranslationalMedicine, 3(87), p.1-4. Review of Bainbridge et al and discussion of WGS and targeted or Exome-Seq. They also suggest that an exome costs 5-15 fold less that a WGS.

8. Maxmen, A., 2011. Exome Sequencing Deciphers Rare Diseases. Cell, 144, p.635-637. A review of the undiagnosed Diseases Program at
NIH. Exome-Seq and high-resolution microarrays for genotyping. They mention the team’s first reported discovery of a new disease, which was published in The New England Journal of Medicine.

Monday, 3 October 2011

Building my kids a boat

I made a decision to try and build my kids a little dinghy so they could row about our local river and we might also spend a weekend on the Broads.

Some of you will be thinking this guy runs a genomics lab and that the skills required for the job don't usually include woodwork and boat building. you'd be right. But a I found out you can build a boat pretty easily with 4-6mm marine plywood, epoxy, Gaffa tape, polypropylene beer glasses, waxing sticks and zip ties.

I used the 'Mouse' boat design freely available on the web from Gavin Atkins I also bought his book  which was a good read covering most of the subject and it also includes plans for several other boats. I chose the Mouse as although it is not the simplest boat in the book to build it did look quite easy and is very boat like.

It took me about a year to build (probably four or five days in earnest) but last weekend we had some friends round for a BBQ and the sun was out. It turned out to be the hottest October day since records began in the UK and so the trip to the river was welcome. The kids all got into their swimmies, the boat was launched and we had a wonderful hour splashing around.

It still needs painting and some rowlocks but it did not sink and can be carried by one person to the car for transport. I'd heartily recommend it for anyone thinking of giving it a go. It certainly turns into a good conversation piece down the pub.