CoreGenomics: SIRVs: RNA-seq controls from @Lexogen

Monday 17 October 2016

SIRVs: RNA-seq controls from @Lexogen

This article was commissioned by Lexogen GmbH.

My lab has been performing RNA-seq for many years, and is currently building new services around single-cell RNA-seq. Fluidigm’s C1, academic efforts such as Drop-seq and inDrop, and commercial platforms from 10X Genomics, Dolomite Bio, Wafergen, Illumina/BioRad, RainDance and others makes establishing the technology in your lab relatively simple. However the data being generated can be difficult to analyse and so we’ve been looking carefully at the controls we use, or should be using, for single-cell, and standard, RNA-seq experiments. The three platforms I’m considering are the Lexogen SIRVs (Spike-In RNA Variants), or SEQUINs, or ERCC 2.0 (External RNA Controls Consortium) controls. All are based on synthetically produced RNAs that aim to mimic complexities of the transcriptome: Lexogen’s SIRVs are the only controls that are currently available commercially; ERCC 2.0 is a developing standard (Lexogen is one of the groups contributing to the discussion), and SEQUINs for RNA and DNA were only recently published in Nature Methods.

You can win a free lane of HiSeq 2500 sequencing of your own RNA-seq libraries (with SIRVs of course) by applying for the Lexogen Research Award

Lexogen’s SIRVs are probably the most complex controls available on the market today as they are designed to assess alternative splicing, alternative transcription start and end sites, overlapping genes, and antisense transcription. They consist of seven artificial genes in-vitro transcribed as multiple (6-18) isoforms to generate a total of 69 transcripts. Each has a 5’triphosphate and a 30nt poly(A)-tail, enabling both mRNA-Seq and TotalRNA-seq methods. Transcripts vary from 191 to 2528nt long and have variable (30-50%) GC-content.

Want to know more: Lexogen are hosting a webinar to describe SIRVs in more detail on October 19th: Controlling RNA-seq experiments using spike-in RNA variants. They have also uploaded a manuscript to BioRxiv that describes the evaluation of SIRVs and provides links to the underlying RNA-Seq data. As a Bioinformatician you might want to download this data set and evaluate the SIRV reads yourself. Or read about how SIRVs are being used in single-cell RNA seq in the latest paper from Sarah Teichmann’s group at EBI/Sanger.

Before diving into a more in-depth description of the Lexogen SIRVs, and how we might be using them in our standard and/or single-cell RNA-seq studies, I thought I’d start with a bit of a historical overview of how RNA controls came about...and that means going back to the days when microarrays were the tool of choice and NGS had yet to be invented!

RNA quality control – MAQC: The use of controls is recommended in any experiment, and the lack of them is one of the oft cited reasons for the current reproducibility crises. Nearly everyone who’s worked on differential gene expression in the last fifteen years has heard of the MAQC (MicroArray Quality Control) study. Although four sources of RNA were evaluated Stratagene’s Universal Human Reference RNA and Ambion’s Human Brain RNA samples were chosen because of the number of genes expressed at a detectable level, and the size of the fold changes between the two samples. These two control samples were used to evaluate five microarray platforms, in an international project involving 137 participants from 51 organisations (see Nat Biotech 2006). Labs like mine adopted, and continue to use the MAQC controls in our differential gene expression pipelines, which today are almost all based on RNA-seq methods. We used them in my lab to show how detection sensitivity drops as RNA inputs are reduced to under 100ng (something I keep meaning to repeat with RNA-seq).

The move to RNA-seq has had a dramatic impact on our ability to perform complex experiments. We are no longer limited to asking questions about the differential expression of genes where we have sequence information available to make an array. RNA-seq allows us to analyse the whole transcriptome; to assess differential gene expression (oligo-dT enriched mRNA-seq is the most widely used method), as well as differential splicing, allele specific expression, polyA tail length, transcription initiation and termination, microRNA, lincRNA, etc, etc, etc (see my "wish list" for controls at the bottom of this post).

The MAQC controls we used are simply not up to the more complex job that RNA-seq presents. Both the ABRF and SEQC papers used MAQC samples, which are admixtures of multiple individuals (I discussed these limitations in a 2014 post), but both included the ERCC controls as well.

Newer, more carefully designed and manufactured controls are available that can better serve the needs to biologists; and this is where SIRVs come in.

The SIRV workflow: from sample to answer

RNA quality control – Lexogen and beyond: SIRVs are designed to represent much of, but not all of, the complexity of Eukaryotic transcriptomes e.g. differential gene expression, differential splicing, polyA tail length variation, GC content, etc. SIRVs are designed to be added to samples before RNA extraction, or starting the RNA-seq library prep. They should allow an objective assessment of the technical biases in library preparation, sequencing and analysis; and ultimately should improve our ability to make biological insights from comparison of experimental conditions. They are a huge leap forward from the MAQC controls, and a significant step ahead of the ERCC1.0 controls, which are restricted to single-exon transcripts.

How are SIRVs made: SIRVs were designed to be similar to Human gene structures with overlapping multi-exon genes that are transcribed in both sense and antisense, with alternative splicing and alternative transcription start and end sites. Genes are in-vitro transcribed from linearized plasmids to produce full-length transcripts which are subject to very careful quality control and quantitation. This includes spectrophotometric, molecular weight, and Agilent Bioanalyser analyses. After QC and QT SIRV transcripts are mixed at equimolar concentrations (E0), or at 8-fold (E1) or 128-fold (E2) variations.

Designing SIRVs: A comparison of SIRV1 and KLK5

How are SIRVs used: Spiking SIRVs into your samples requires some careful consideration of how you’ll use the data they provide in downstream assessment. Today the most important control in my lab is simply whether the library prep has worked, or more importantly where it did not work whether it was the lab or the sample that was the cause of the failure. Our use of MACQ controls on a plate of samples is great, but extending this to an internal control in every sample is going to be better. However I don’t want controls to dominate the experiment or they’ll add too much to the costs of library preparation and sequencing.

SIRVs themselves don’t need much data to generate useful results and around 1% of your sequencing reads should be sufficient for most labs. However determining how much SIRV mix to add to your samples before extraction, or your RNA before library prep can require some empirical testing as the amount of RNA in a sample or a cell differs so much. As a rule of thumb 95% of RNA is ribosomal RNA’s, and the other 5% is mRNA (and non-coding RNAs). For an experiment starting with 100ng of TotalRNA in an mRNA-seq workflow approximately 50pg would represent 1% of the 5ng of mRNA present.

SIRVs are available in three configurations E0, E1 & E2 that mix the in vitro transcribed RNAs at equimolar (mix E0), up to 8-fold (mix E1), or up to 128-fold (mix E2), variation in concentration. Importantly SIRVs are built in a modular format and should be compatible to other spike in controls like the ERCC. Additional modules should address transcript lengths, polyA tail length variation, etc.

Coinciding with the webinar on October 19th, Lexogen will release the “SIRVs suite” (see "How are SIRVs analysed" below) for analysis of spike-in data. This will also include an "Experiment Designer" tool to calculate recommended spike-in ratios based on known or expected input for the RNA content, mRNA ratio, and type and efficiency of the workflow.

SIRVs in bulk RNA-seq: Bulk RNA-seq experiments can use SIRVs as process controls in place of the MAQC Brain and UHRR samples allowing a full 96 samples to be run on each plate. Assuming the 100ng TotalRNA input then just 50pg of SIRVs are needed per sample, with 5ng added to the oligo-dT master-mix used in the enrichment step. The use of SIRV E0 is recommended for process QC, but E1 and E2 may be useful when evaluating new methods for accuracy and precision of differential transcript detection and quantitation.

SIRVs in scRNA-seq: Single-cell RNA-seq has quickly adopted spike-in controls with Hashimshony et al presenting their use of ERCC spikes in the CELSeq protocol. Both Wu et al 2013 and Truetlein et al 2014 used the ERCC mixes at a 1:40,000 dilution spiked into the cell lysis mix of the Fluidigm C1 protocol. And Svensson et al use the ERCC and SIRV spikein's to assess sensitivity and accuracy of various protocols across a standard analysis pipeline. This demonstrates the utility of using RNA control spike-ins, but also the requirement for careful dilution to avoid swamping single-cell RNA-seq experiments with control data, or not having enough to QC data before interpreting results. Assuming each single cell has around 20pg of TotalRNA then just 200fg of SIRVs are needed per sample, the amount of SIRV added, and exactly where to add it the protocol is highly dependent on the single-cell RNA-seq protocol being used.

How are SIRVs analysed: Lexogen will release the Galaxy-based “SIRVs suite” for uploading, evaluating and comparing spike-in data. This will allow SIRV users to compare results from their experiments to anonymised data, and should help determine if your own experiment is any good. Back in 2003/4 I developed rptDB: a tool to compare QC data between Affymetrix arrays. This had over 3500 samples submitted to it, and allowed a quick easy call on whether your data was "good" or "bad" - highly context dependant of courrse! As a user if I had received data from a core lab or service provider, or were downloading RNA-seq data for meta-analysis, then being able to select only data where SIRV, or other, controls had been used, and where results were shown to be high-quality, would most likely save me considerable time in cleaning up data before starting.

SIRVS are not designed to be used as a normalisation tool. Whilst spike-ins have been considered they are not really reliable enough for standard normalisation procedures. The development of novel normalisation algorithms appears to offer hope for the future (see Risso 2014), and approaches like this might be applicable to SIRVs. I suspect this will be an active area of algorithm development over the next couple of years because of the huge interest in single-cell RNA-seq.

The competition: alternative RNA-seq controls

Sequins: ‘Sequins’ (sequencing spike-ins) were developed by the Garvan Institute and recently published in Nature Methods. Sequins are conceptually similar to SIRVs. They are a set of synthetic RNA isoforms that align to an artificial in silico chromosome, with no homology to known genomes. They represent full-length spliced mRNA isoforms, at a range of concentrations. They can be used to assess differential gene expression and alternative splicing pipelines. The authors state that sequins can by used for normalisation, and refer to the same Nature Biotech as I did above. In their Nature Methods paper they do show some very nice results from scaling normalisation using sequins and I hope these results will ultimately be achieveable with any well-designed spike-in series.

In the back-to-back Nature Methods publications the team at Garvan show how sequins can be used in RNA-seq and DNA-seq experiments to asses biases and determine the limits of detection, quantitation and analytical methods. Sequin genes are mixed in a two-fold serial dilution, with a minimum three genes per dilution, to span an ~106-fold range. The team also developed 24 Sequins to represent cancer fusion genes and used these to assess fusion gene detection and quantitation. They also reported that split reads significantly outperformed read-pairs in their correlation with Sequin concentration – this has a significant impact on the sequencing format as many groups today use paired-end reads where longer single-end reads may be more sensitive, and would also be around 40% cheaper.

ERCC 2.0: the original ERCC1.0 controls are a mix of 92 relatively simple single-exon transcripts of varying length and GC content. They are used in a mix at known concentrations spikedinto samples before library preparation. ERCC2.0 aims to update the spikes to better represent the complexity of the transcriptome, and to provide FFPE derived controls. Again they are are conceptually similar to SIRVs and Lexogen were one of 9 groups invited to present at the 2014 NIST ERCC2.0 workshop at Stanford University.

Conclusions: The use of controls in RNA-seq experiments is an absolute requirement if you want to get the best out of your experiments. Bulk RNA-seq can benefit from a relatively simple data QC of the controls before moving onto more complex differential gene expression and splicing analyses. And including spike-in controls may allow easier comparison of longitudinal data sets, or between labs. Single-cell RNA-seq has shown an absolute requirement to include spike-ins, although the very latest papers suggest that spiked-in transcripts may not truly mirror Human mRNAs in the protocols used, due to much shorter poly-A tails (30 vs 200+bp), and that they may underestimate detection sensitivity by up to ten-fold.

SIRVs, more recently SEQUINs, and soon ERCC2.0 controls can be further enhanced and manufacturers should not be consider their job complete! With protocols like Pacific Bioscience’s ISO-seq and the advent of Oxford Nanopores direct RNA-sequencing longer and longer transcripts could be assessed and this will need to be controlled. Phased sequencing, possibly from long RNA molecules on 10X Genomics, is likely to need controls where variants are phased. Additionally PacBio and Nanopore sequencing also offer the ability to detect and quantify RNA base modifications. All of this shows how far the controls we might use still have to go.

My RNA controls wish list:

differential gene expression normalisation

differential splicing

allele specific expression

transcript and polyA tail length variation

GC content

transcription initiation and termination

non poly-adenylated RNAs e.g. microRNA, lincRNA

pseudogene mapping

limits of detection

RNA variant detection at different MAF

High-quality and degraded FFPE RNA

Spike-in's with corresponding baits for in-solution capture

Spike-in RNA encapsulated in synthetic cells

Phased variants on long RNAs

RNA base modifications

Please let me know what you’d like to add by leaving a comment below.http://biorxiv.org/content/early/2016/10/13/080747

5 comments:

澄冰寒雪18 October 2016 at 16:11
The Sequins paper should have used TPM instead of RLE as expression levels.
ReplyDelete
Replies
Unknown5 February 2017 at 06:13
Appreciation is the key to doing more that is why I have took some time out to thank some one who cured me of my 2 years HEPATITIS B problem. It became a major problem to me as it was affecting my marital life and I was no longer comfortable so I decided to look for a solution and I came across a post of Dr Ariba and how he has been helping people of the same problem I contacted him and told him all I have been facing in my life. He told me how to get his product and how to take it after every thing I find out that all was now okay with me and that my HEPATITIS B problem was gone that is why I have come out today to say thank you to him . if you need cure for HIV/AIDS, Herpes, cancer or diabetes of all types you can contact his email draribaspelltemple@gmail.com OR draribasolutioncenter@yahoo.com
Whatsapp: +2348140439497
ReplyDelete
Replies
Unknown24 March 2017 at 23:58
Though i haven't met DR Abor but i have being hearing and seeing his
wonderful deeds on people's life.. This made me contacted him because i was
also diagnosed of CANCER, When i contacted Him, without wasting time, he
started his Miraculous work in my Life, I am happy and Glad to say that i
am now cured after using his herbal Medicine.. You can also reach him
IRABORSPELLHERBALHOME@GMAIL.COM or reach him on whatsap +393512728432 he also
special on cureing
1...HERPS CURE
2...ALS CURE
3..HIV CURE
4.EPILESY
5...PROMOTION IN THE OFFICE
6...LOVE SPEEL
7>>EX BACK

You can also reach him IRABORSPELLHERBALHOME@GMAIL.COM or reach him on whatsap +393512728432
ReplyDelete
Replies
DR SAVOIR23 May 2017 at 19:20
REUNITE YOUR MARRIAGE, RELATIONSHIP,SUCCESSFUL LOTTERY SPELL RESULTS, CURE ALL KINDS OF DISEASE AND WEAK ERECTION

Stop asking for another chance, stop questioning what you could have done differently, and – most importantly – stop second-guessing your own self-worth. Once you Come Back To My Spell is performed, the balance of desire will shift and your lover will be begging you to take him or her back. This Love Spell Could Work For You!Bring back the good times. Think about the happiest time in your relationship, and i can make it happen again. You deserve to be happy.

Are you looking for ultimate spells that works effectively same time on your problems, ranges from Love spells, marriage spells, and carrier related problems, witchcraft spells, charms, portion, talismans, magical oils / powders, protection, legal court cases, business prosperity spell, Gay and lesbian extreme love / attraction spells, women of the night Norton accruing powders and oil for extreme ecstasy and protection with in working ordeals.

My powerful lottery spells, will bring you the huge wins, and jackpots you desire and need. I work my lottery spells, to bring great luck. The power of my lottery spells, works on you, so there are no special numbers needed to play, or certain patterns. Just play one ticket on the lottery of your choice, and the powerful lottery spell will handle the rest. Whether you play Power-ball, Mega Millions, daily drawings or sweepstakes. I have a powerful lottery spell, to help you. Working with my spirit guides, to clear bad luck, and infuse you with good luck and positive energy energy. The power of my lottery spells, brings luck, and wins fast. Clearing paths with-in the universe, for money and great luck to reach you. These powerful spell castings, will be successful in bringing the lottery wins, to change your life.

Email: POTENTIALSPELLTEMPLE@GMAIL.COM

OR

POTENTIALSPELLTEMPLE@YAHOO.COM
ReplyDelete
Replies

Add comment

Note: only a member of this blog may post a comment.

Pages

Monday 17 October 2016

SIRVs: RNA-seq controls from @Lexogen

5 comments: