How To Profile DNA And RNA Expression Using Next Generation Sequencing

Why is Next Generation Sequencing so powerful to explore and answer both clinical and research questions. With the ability to sequence whole genomes, identifying novel changes between individuals, to exploring what RNA sequences are being expressed, or to examine DNA modifications and protein-DNA interactions occurring that can help researchers better understand the complex regulation of transcription. This, in turn, allows them to characterize changes during different disease states, which can suggest a way to treat said disease. 

Over the next two blogs, I will highlight these different methods along with illustrating how these can help clinical diagnostics as well as advancing our understanding of how to characterize and treat diseases. To start, we focus on methods to characterize the DNA and RNA directly. The next blog will look at how Next Generation Sequencing can be used in understanding modifications to the DNA, as well as characterizing protein binding regions of the DNA. 

Let’s begin by learning more about sequencing DNA and RNA.  

A. Genome and DNA sequencing methods

With the continued drop in sequencing costs, genome sequencing has become a powerful tool to expeditiously sequence and resequence genomes. The needs of the experiment will drive the level of sequencing that needs to be performed, which in turn impacts not just the cost (financially) of the experiment, but the time and effort (computationally) in analyzing the data. 

 1. Whole Genome Resequencing (WGR):

Resequencing” is done to investigate the differences between the genome of specific individuals (ideally chosen based on phenotypes) and that of the reference genome.  In other words, resequencing refers to the sequencing of the genome using Next Generation Sequencing methods for which reference genome is already available.

 Consequently, the resequenced genome can then be compared with the reference genome to determine a catalog of mutations/aberrations (Single Nucleotide Variants, Copy Number Variants, Insertions, and Deletions) specific to each sequenced individual. 

 This method provides valuable insight into the individuals’ genetic background and hence helps in accurate clinical diagnoses.  

2. Whole Genome De Novo Resequencing (WGDR): 

When the reference genome is not available for downstream mutation determination analyses, the sequenced data obtained from resequencing is used to assemble a reference genome. These assembled genomes are used for downstream analyses and genes’ annotations. 

This is mostly done for prokaryotes and viruses as their reference genomes are not readily available for research. WGDR is useful in metagenomics – the study of bacterial composition in environmental samples like wastewater/drinking water – done to identify the pathogenic bacterial strains. Such research is quite prevalent across labs in countries to improve water quality in the communities. The de novo genome assembly quality directly depends on library quality, sequencing accuracy, and sequencing coverage. The higher the coverage, the better the quality of the sequenced data. To better understand the essential concepts required for an efficient genome assembly, gene prediction and annotations, make sure to check out my other blogs.

3. Whole Exome Sequencing (WES):

Exons are the protein-coding regions that comprise less than 2% of the total genome but are enriched for disease-causing variants. Therefore, WES is more cost-effective to sequence than WGS, with many folds decrease in the effort and expense required for sequencing. WES has made it possible to conduct resequencing studies in species that have very large, repetitive, or polyploid genomes, such as some plants that have been selectively bred. There are a few significant exome-capture kits/methods available in the market – namely, HaloPlex, Ampliseq, SureSelect, and SeqCap.  SureSelect and SeCap are “capture hybridization-based” as they rely on sonication of DNA fragments, followed by the hybridization of oligonucleotides specific to exons, whereas, HaloPlex and AmpliSeq are amplicon-based methods because they are based on PCR amplification of exonic regions using PCR primers (Figure 1).

Figure 1: A representation of amplicon-based and capture hybridization-based exome capture kits.

4. Targeted Sequencing 

Targeted sequencing determines the DNA sequence in a subset of genes or regions of the genome. Both WES and targeted sequencing focus time, expense, and data analysis on genomic regions of interest. WES is often confused with targeted sequencing as they are similar. The subtle difference is that WES characterizes all exons, but targeted sequencing characterizes only genomic regions of interest. An added advantage of targeted sequencing is  it becomes more affordable to sequence at very high coverage, which is necessary for specific applications. For example, WGS typically provides 30-50x coverage, whereas targeted sequencing can cover the target region at 500-1000x. The higher coverage makes it possible to identify rare variants that would otherwise be undetected at lower coverage. Moreover, targeted sequencing generates a smaller and more manageable dataset, thus saving on the computational resources needed for analysis.

B. RNA Sequencing Methods

RNA sequencing or transcriptomic profiling can be used to perform high-throughput measurement of RNA levels and gene expressions. Compared to the microarray and Sanger sequencing-based approach, RNA sequencing is not limited in resolution, specificity, and sensitivity.  Furthermore, prior knowledge of transcript sequences and their isoforms is not required.

Some advantages of RNA sequencing over established methods are:

—> The possibility of having a genome-wide coverage, regardless of whether a reference genome is available.

—> Better sensitivity and specificity. The dynamic range of RNA-Seq spans five orders of magnitude given sufficient coverage, which is significantly higher than the existing array-based measurements, which can measure at most three orders of magnitude in changes.

—> Unbiased detection of all transcripts and isoforms, including novel ones.

—> Detection of transcripts or isoforms expressed at low levels if the sequencing is done at sufficient depth.

—> Absence of background noise from microarray hybridization.

—> Detection of genetic variants that affect transcripts, especially for highly and moderately expressed ones.

In cancer research, gene expression and small RNA profiling provide information about active pathways that drive tumorigenesis, which may not be captured with genome sequencing. The Cancer Genome Atlas Project has profiled the transcriptomes of over 4,000 cancer tissue samples in 12 cancer types. Also, clinical tests that use RNA-Seq, such as OncotypeDX, which predict drug responses and individual prognosis based on gene expression signature, are now commercially available in the US.

Beyond simply profiling RNA molecules, RNA-Seq is a great tool to help us better understand epigenetics and RNA biology, especially with the advent of innovative techniques. For example, RNA-Seq methods could be used to capture snapshots of alternative splicing, RNA editing, nascent transcripts, ribosome-bound transcripts, and fusion transcripts.

Apart from RNA and mRNA profiling, combinations of molecular biology and biochemical techniques with Next Generation Sequencing have led to the development of many RNA-Seq derived methods:

a. Ribo-Seq Ribosome profile sequencing: to identify RNAs being processed by the ribosome to monitor the translation process

b. miRNA-Seq: Sequencing for microRNAs 

c. ChIRP-Seq (Chromatin Isolation by RNA): purification – to discover regions of the genome bound by specific RNA

d. PAR-CLIP (Photoactivatable-Ribonucleoside):  Enhanced Crosslinking and Immunoprecipitation sequencing – to identify and characterize binding sites of RNA-binding proteins and miRNA-containing ribonucleoprotein complexes (miRNPs)

e. CLIP-Seq (Cross-linking and Immunoprecipitation Sequencing):  to identify the binding sites of cellular RNA-binding proteins (RBPs) using UV light to cross-link RNAs to RBPs

Concluding Remarks

Taken together, the sequencing of DNA at the whole genome level to just the whole exon-level can provide the researcher with a tremendous amount of data that can help identify markers for a disease, which can be further expanded to look at what RNA transcripts are present as a result of these marker sequences. Next Generation Sequencing really offers the ability to decode the book of life and figure out where misprints lead to adverse outcomes. 

In future blog articles, we will delve into the finer details of these methods and why you would choose one method over another. Tune back in next time when we will explore how Next Generation Sequencing can be used in understanding DNA modification and identifying sites of DNA:Protein interaction

To learn more about gene prediction and how NGS can assist you, and to get access to all of our advanced materials including 20 training videos, presentations, workbooks, and private group membership, get on the Expert Sequencing wait list.

Join Expert Cytometry's Mastery Class
Deepak Kumar, PhD
Deepak Kumar, PhD Genomics Software Application Engineer

Deepak Kumar is a Genomics Software Application Engineer (Bioinformatics) at Agilent Technologies. He is the founder of the Expert Sequencing Program (ExSeq) at Cheeky Scientist. The ExSeq program provides a holistic understanding of the Next Generation Sequencing (NGS) field - its intricate concepts, and insights on sequenced data computational analyses. He holds diverse professional experience in Bioinformatics and computational biology and is always keen on formulating computational solutions to biological problems.

Similar Articles

Combining Flow Cytometry With Plant Science, Microorganisms, And The Environment

Combining Flow Cytometry With Plant Science, Microorganisms, And The Environment

By: Tim Bushnell, PhD

My first introduction to flow cytometry was talking to a professor who’d brought one on a research cruise to study phytoplankton. It was only later that I was introduced to the marvelous world that’s been my career for over 20 years.   In that time, I’ve had the opportunity to work with researchers in many different areas, exposing me to a wide variety of cell types and more important assays. What continues to amaze me is the number of different parameters we can measure, not just the number of fluorochromes, but the information we can extract from samples – animal, vegetable…

Common Numbers-Based Questions I Get As A Flow Cytometry Core Manager And How To Answer Them

Common Numbers-Based Questions I Get As A Flow Cytometry Core Manager And How To Answer Them

By: Tim Bushnell, PhD

Numbers are all around us.  My personal favorite is ≅1.618 aka ɸ aka ‘the golden ratio’.  It’s found throughout history, where it has influenced architects and artists. We see it in nature, in plants, and it is used in movies to frame shots. It can be approximated by the Fibonacci sequence (another math favorite of mine). However, I have not worked out how to apply this to flow cytometry.  That doesn’t mean numbers aren’t important in flow cytometry. They are central to everything we do, and in this blog, I’m going to flit around numbers-based questions that I have received…

3 Must-Have High-Dimensional Flow Cytometry Controls

3 Must-Have High-Dimensional Flow Cytometry Controls

By: Tim Bushnell, PhD

Developments such as the recent upgrade to the Cytobank analysis platform and the creation of new packages such as Immunocluster are reducing the computational expertise needed to work with high-dimensional flow cytometry datasets. Whether you are a researcher in academia, industry, or government, you may want to take advantage of the reduced barrier to entry to apply high-dimensional flow cytometry in your work. However, you’ll need the right experimental design to access the new transformative insights available through these approaches and avoid wasting the considerable time and money required for performing them. As with all experiments, a good design begins…

The Fluorochrome Less Excited: How To Build A Flow Cytometry Antibody Panel

The Fluorochrome Less Excited: How To Build A Flow Cytometry Antibody Panel

By: Tim Bushnell, PhD

Fluorochrome, antibodies and detectors are important. The journey of a thousand cells starts with a good fluorescent panel. The polychromatic panel is the combination of antibodies and fluorochromes. These will be used during the experiment to answer the biological question of interest. When you only need a few targets, the creation of the panel is relatively straightforward. It’s only when you start to get into more complex panels with multiple fluorochromes that overlap in excitation and emission gets more interesting.  FLUOROCHROMES Both full spectrum and traditional fluorescent flow cytometry rely on measuring the emission of the fluorochromes that are attached…

Flow Cytometry Year in Review: Key Changes To Know

Flow Cytometry Year in Review: Key Changes To Know

By: Meerambika Mishra

Here we are, at the end of an eventful year 2021. But with the promise of a new year 2022 to come. It has been a long year, filled with ups and downs. It is always good to reflect on the past year as we move to the future.  In Memoriam Sir Isaac Newton wrote “If I have seen further, it is by standing upon the shoulders of giants.” In the past year, we have lost some giants of our field including Zbigniew Darzynkiwicz, who contributed much in the areas of cell cycle analysis and apoptosis. Howard Shapiro, known for…

What Star Trek Taught Me About Flow Cytometry

What Star Trek Taught Me About Flow Cytometry

By: Tim Bushnell, PhD

It is no secret that I am a very big fan of the Star Trek franchise. There are many good episodes and lessons explored in the 813+ episodes, 12 movies (and counting). Don’t worry, this blog is not going to review all 813, or even 5 of them. Instead, some of the lessons I have taken away from the show that have applicability to science and flow cytometry.  “Darmok and Jalad at Tanagra.”  (ST:TNG season 5, episode 2) This is probably one of my favorite episodes, which involves Picard and an alien trying to establish a common ground and learn…

5 Flow Cytometry Strategies That Sun Tzu Taught Me

5 Flow Cytometry Strategies That Sun Tzu Taught Me

By: Tim Bushnell, PhD

Sun Tzu was a Chinese general and philosopher. His most famous writing is ‘The Art of War’, and has been studied by generals and CEOs, to glean ideas and strategies to help their missions. I was recently rereading this work and thought to myself if any of Sun Tzu’s lessons could apply to flow cytometry.  So I have identified 5 points that I think lend themselves to thinking about flow cytometry.  “Quickness is the essence of the war.” In flow cytometry, speed is of the essence. The longer the cells are out of their natural environment, the less happy they…

A Basic Guide To Flow Cytometry (3 Foundational Concepts)

A Basic Guide To Flow Cytometry (3 Foundational Concepts)

By: Meerambika Mishra

Mastering foundational concepts are imperative for successfully using any technique or system.  Robert Heinlein introduced the term ‘Grok’  in his novel Stranger in a Strange Land. Ever since then it has made its way into popular culture. To Grok something is to understand it intuitively, fully. As a cytometrist, there are several key concepts that you must grok to be successful in your career. These foundational concepts are the key tools that we use day in and day out to identify and characterize our cells of interest.  Cells Flow cytometry measures biological processes at the whole cell level. To do…

How To Do Variant Calling From RNASeq NGS Data

How To Do Variant Calling From RNASeq NGS Data

By: Deepak Kumar, PhD

Developing variant calling and analysis pipelines for NGS sequenced data have become a norm in clinical labs. These pipelines include a strategic integration of several tools and techniques to identify molecular and structural variants. That eventually helps in the apt variant annotation and interpretation. This blog will delve into the concepts and intricacies of developing a “variant calling” pipeline using GATK. “Variant calling” can also be performed using tools other than GATK, such as FREEBAYES and SAMTOOLS.  In this blog, I will walk you through variant calling methods on Illumina germline RNASeq data. In the steps, wherever required, I will…

Top Technical Training eBooks

Get the Advanced Microscopy eBook

Get the Advanced Microscopy eBook

Heather Brown-Harding, PhD

Learn the best practices and advanced techniques across the diverse fields of microscopy, including instrumentation, experimental setup, image analysis, figure preparation, and more.

Get The Free Modern Flow Cytometry eBook

Get The Free Modern Flow Cytometry eBook

Tim Bushnell, PhD

Learn the best practices of flow cytometry experimentation, data analysis, figure preparation, antibody panel design, instrumentation and more.

Get The Free 4-10 Compensation eBook

Get The Free 4-10 Compensation eBook

Tim Bushnell, PhD

Advanced 4-10 Color Compensation, Learn strategies for designing advanced antibody compensation panels and how to use your compensation matrix to analyze your experimental data.