quinta-feira, 29 de janeiro de 2015

Next Generation Sequencing vs Sanger Sequencing

Hi all,

Long time, no see.

We are very sorry for the long time to update the blog. We have been working like crazy since we had a big methodological change on our aproch to reach our final goal. 

Recapitulating, our main aim was to sequence the CYP1 and AHR genes of 100 Siluriformes species and this would be done by the Sanger method DNA sequencing. However we had some issues, with the primer, for instance. Basically, we were able to amplify cyp1a in just six of the 30 species we had sampled. For practical purposes, this was not good. However, it suggested that the molecular diversity of cyp1a in loricariidae fish is much greater than what we expected, which in turn is an excellent news. Our sequencing method was modified from the traditional Sanger method to one of the Next Generation Sequencing (NGS) methods.

Now we are sequencing the liver transcriptome of 40 individuals from 37 different species using the Illumina Technology Hiseq2500 (Next Generation Sequencing) at the Brazilian National Cancer Institute (INCA).

A major advantage of this new approach is that it is not based on specific primers. Now we will obtain the molecular data without the bias caused by primers. Besides, it will generate much more raw data to analyse than our previous method. Consequently, we will get the sequence of not only the two desired genes, CYP1A and AHR, but from all genes that were expressed at the time of the sampled fish liver. In fact, this method is generating more raw data than we will be able to analyse during this peer grant. During this grant period, we are focusing our attention in a particular set of genes involved in the responses of the organisms to chemical compounds. Other genes will be analysed later. Another advantage is the price per base pair which is cheaper than Sanger, as we can see in the table below.


Method
Sanger
NGS
Read Length
Up to 1.100 bases
2x 100pb
Read \Run
96
2 billions
Time\Run
1 hour
<1-6days
Capacity
~1Mb
50-1000Gb
Price\Run
~$480
~$9.600

So, I will talk about the processes before and during the preparation of these libraries. First of all, we had to talk with the responsible for the Illumina Hiseq 2500 in INCA to know if it would be possible to use this sequencer, when it would happen and if the technologist, Carolina Furtado, could teach us to prepare our samples. After it was solved, we could start to work hard.

Well, this is a summary of what has being going on!

We wish you enjoy it and hope to write more often.



See you soon!

Preparing the libraries for the Next Generation Sequencing

Today let´s talk a little about Next Generation Sequencing.

After we decided for this sequencing method, we had to prepare all samples to be sequenced; which includes RNA extraction, verification of RNA quality using a method more reliable than the one we used before, construct the cDNA libraries and evaluate its quality and quantity.

Based on the nanodrop quantification, we performed another kind of analysis using the Bioanalyzer, which is a more precise equipment to assess the quality of the material that has been extracted. Using this method, a RNA Integrity Number (RIN) is generated, this is a number assigned by the software that considers also the presence of degradation products. Although Illumina recommend a RIN higher than eight for transcriptome sequencing, we sete our threshold in six due to sample particularities. By doing this, we assume the risk of getting transcriptomes biased for the 3' end of the transcripts. However, most of our samples had RIN between 7.5 - 8.0  and our first results indicates high coverture of 5' end.

Initially, we select the mRNA with special beads that contains oligo dT, this way the material will be purified and only the mRNA will stay in the well plate. After this, we fragment the RNA in a delicate step of 3 minutes at 94 ° C in the thermocycler.The longer this step takes, the more fragmented the RNA gets. This time give us fragments of about 300pb.

The next step is to synthesise the cDNA in two different reactions. First strand first and then, in another reaction, the second strand. This way we end up with double strand cDNA, and not with the hybrid cDNA used for regular molecular biology applications. When the cDNA is ready, we have to  repair their Ends to ensure that each cDNA molecule has a blunt end and contains a 5 'phosphate and an end 3'OH free.

The adapters binding step is crucial to sequencing as it is when we give an identity for each sample.  By doing this, we can sequence several different samples in the same lane. However, it is vital to take note about which adapters were used in each samples, so at the time of sequencing, we won't put samples with the same adapter in the same lane. Then, it is made a short (15 cycles) PCR enrichment of DNA fragments, where the molecules with adapters at both ends are selected and the DNA sample is amplified. In this PCR primers binds the adapters.

Finally, we need to run the libraries in bioanalyzer to check their sizes and perform qPCR for precise quantification. The size and quantity informations are important for the normalisation, when all samples are prepared to have the same number of molecules, so all the samples at the same lane will have equal chances to be sequenced. Then the samples are finally ready to be sequenced with illumina Hiseq 2500.