How To Profile DNA And RNA Expression Using Next Generation Sequencing

Why is Next Generation Sequencing so powerful to explore and answer both clinical and research questions. With the ability to sequence whole genomes, identifying novel changes between individuals, to exploring what RNA sequences are being expressed, or to examine DNA modifications and protein-DNA interactions occurring that can help researchers better understand the complex regulation of transcription. This, in turn, allows them to characterize changes during different disease states, which can suggest a way to treat said disease. 

Over the next two blogs, I will highlight these different methods along with illustrating how these can help clinical diagnostics as well as advancing our understanding of how to characterize and treat diseases. To start, we focus on methods to characterize the DNA and RNA directly. The next blog will look at how Next Generation Sequencing can be used in understanding modifications to the DNA, as well as characterizing protein binding regions of the DNA. 

Let’s begin by learning more about sequencing DNA and RNA.  

A. Genome and DNA sequencing methods

With the continued drop in sequencing costs, genome sequencing has become a powerful tool to expeditiously sequence and resequence genomes. The needs of the experiment will drive the level of sequencing that needs to be performed, which in turn impacts not just the cost (financially) of the experiment, but the time and effort (computationally) in analyzing the data. 

 1. Whole Genome Resequencing (WGR):

Resequencing” is done to investigate the differences between the genome of specific individuals (ideally chosen based on phenotypes) and that of the reference genome.  In other words, resequencing refers to the sequencing of the genome using Next Generation Sequencing methods for which reference genome is already available.

 Consequently, the resequenced genome can then be compared with the reference genome to determine a catalog of mutations/aberrations (Single Nucleotide Variants, Copy Number Variants, Insertions, and Deletions) specific to each sequenced individual. 

 This method provides valuable insight into the individuals’ genetic background and hence helps in accurate clinical diagnoses.  

2. Whole Genome De Novo Resequencing (WGDR): 

When the reference genome is not available for downstream mutation determination analyses, the sequenced data obtained from resequencing is used to assemble a reference genome. These assembled genomes are used for downstream analyses and genes’ annotations. 

This is mostly done for prokaryotes and viruses as their reference genomes are not readily available for research. WGDR is useful in metagenomics – the study of bacterial composition in environmental samples like wastewater/drinking water – done to identify the pathogenic bacterial strains. Such research is quite prevalent across labs in countries to improve water quality in the communities. The de novo genome assembly quality directly depends on library quality, sequencing accuracy, and sequencing coverage. The higher the coverage, the better the quality of the sequenced data. To better understand the essential concepts required for an efficient genome assembly, gene prediction and annotations, make sure to check out my other blogs.

3. Whole Exome Sequencing (WES):

Exons are the protein-coding regions that comprise less than 2% of the total genome but are enriched for disease-causing variants. Therefore, WES is more cost-effective to sequence than WGS, with many folds decrease in the effort and expense required for sequencing. WES has made it possible to conduct resequencing studies in species that have very large, repetitive, or polyploid genomes, such as some plants that have been selectively bred. There are a few significant exome-capture kits/methods available in the market – namely, HaloPlex, Ampliseq, SureSelect, and SeqCap.  SureSelect and SeCap are “capture hybridization-based” as they rely on sonication of DNA fragments, followed by the hybridization of oligonucleotides specific to exons, whereas, HaloPlex and AmpliSeq are amplicon-based methods because they are based on PCR amplification of exonic regions using PCR primers (Figure 1).

Figure 1: A representation of amplicon-based and capture hybridization-based exome capture kits.

4. Targeted Sequencing 

Targeted sequencing determines the DNA sequence in a subset of genes or regions of the genome. Both WES and targeted sequencing focus time, expense, and data analysis on genomic regions of interest. WES is often confused with targeted sequencing as they are similar. The subtle difference is that WES characterizes all exons, but targeted sequencing characterizes only genomic regions of interest. An added advantage of targeted sequencing is  it becomes more affordable to sequence at very high coverage, which is necessary for specific applications. For example, WGS typically provides 30-50x coverage, whereas targeted sequencing can cover the target region at 500-1000x. The higher coverage makes it possible to identify rare variants that would otherwise be undetected at lower coverage. Moreover, targeted sequencing generates a smaller and more manageable dataset, thus saving on the computational resources needed for analysis.

B. RNA Sequencing Methods

RNA sequencing or transcriptomic profiling can be used to perform high-throughput measurement of RNA levels and gene expressions. Compared to the microarray and Sanger sequencing-based approach, RNA sequencing is not limited in resolution, specificity, and sensitivity.  Furthermore, prior knowledge of transcript sequences and their isoforms is not required.

Some advantages of RNA sequencing over established methods are:

—> The possibility of having a genome-wide coverage, regardless of whether a reference genome is available.

—> Better sensitivity and specificity. The dynamic range of RNA-Seq spans five orders of magnitude given sufficient coverage, which is significantly higher than the existing array-based measurements, which can measure at most three orders of magnitude in changes.

—> Unbiased detection of all transcripts and isoforms, including novel ones.

—> Detection of transcripts or isoforms expressed at low levels if the sequencing is done at sufficient depth.

—> Absence of background noise from microarray hybridization.

—> Detection of genetic variants that affect transcripts, especially for highly and moderately expressed ones.

In cancer research, gene expression and small RNA profiling provide information about active pathways that drive tumorigenesis, which may not be captured with genome sequencing. The Cancer Genome Atlas Project has profiled the transcriptomes of over 4,000 cancer tissue samples in 12 cancer types. Also, clinical tests that use RNA-Seq, such as OncotypeDX, which predict drug responses and individual prognosis based on gene expression signature, are now commercially available in the US.

Beyond simply profiling RNA molecules, RNA-Seq is a great tool to help us better understand epigenetics and RNA biology, especially with the advent of innovative techniques. For example, RNA-Seq methods could be used to capture snapshots of alternative splicing, RNA editing, nascent transcripts, ribosome-bound transcripts, and fusion transcripts.

Apart from RNA and mRNA profiling, combinations of molecular biology and biochemical techniques with Next Generation Sequencing have led to the development of many RNA-Seq derived methods:

a. Ribo-Seq Ribosome profile sequencing: to identify RNAs being processed by the ribosome to monitor the translation process

b. miRNA-Seq: Sequencing for microRNAs 

c. ChIRP-Seq (Chromatin Isolation by RNA): purification – to discover regions of the genome bound by specific RNA

d. PAR-CLIP (Photoactivatable-Ribonucleoside):  Enhanced Crosslinking and Immunoprecipitation sequencing – to identify and characterize binding sites of RNA-binding proteins and miRNA-containing ribonucleoprotein complexes (miRNPs)

e. CLIP-Seq (Cross-linking and Immunoprecipitation Sequencing):  to identify the binding sites of cellular RNA-binding proteins (RBPs) using UV light to cross-link RNAs to RBPs

Concluding Remarks

Taken together, the sequencing of DNA at the whole genome level to just the whole exon-level can provide the researcher with a tremendous amount of data that can help identify markers for a disease, which can be further expanded to look at what RNA transcripts are present as a result of these marker sequences. Next Generation Sequencing really offers the ability to decode the book of life and figure out where misprints lead to adverse outcomes. 

In future blog articles, we will delve into the finer details of these methods and why you would choose one method over another. Tune back in next time when we will explore how Next Generation Sequencing can be used in understanding DNA modification and identifying sites of DNA:Protein interaction

To learn more about gene prediction and how NGS can assist you, and to get access to all of our advanced materials including 20 training videos, presentations, workbooks, and private group membership, get on the Expert Sequencing wait list.

Join Expert Cytometry's Mastery Class
Deepak Kumar, PhD
Deepak Kumar, PhD Genomics Software Application Engineer

Deepak Kumar is a Genomics Software Application Engineer (Bioinformatics) at Agilent Technologies. He is the founder of the Expert Sequencing Program (ExSeq) at Cheeky Scientist. The ExSeq program provides a holistic understanding of the Next Generation Sequencing (NGS) field - its intricate concepts, and insights on sequenced data computational analyses. He holds diverse professional experience in Bioinformatics and computational biology and is always keen on formulating computational solutions to biological problems.

Similar Articles

Optimizing Flow Cytometry Experiments - Part 2         How To Block Samples (Sample Blocking)

Optimizing Flow Cytometry Experiments - Part 2 How To Block Samples (Sample Blocking)

By: Tim Bushnell, PhD

In my previous blog on  experimental optimization, we discussed the idea of identifying the best antibody concentration for staining the cells. We did this through a process called titration, which  focuses on finding the best signal-to-noise ratio at the lowest antibody concentration. In this blog we will deal with sample blocking As a reminder, there are two other major binding concerns with antibodies. The first is the specific binding of the Fc fragment of the antibody to the Fc Receptor expressed on some cells. This protein is critical for the process of destroying microbes or other cells that have been…

What Is Next Generation Sequencing (NGS) And How Is It Used In Drug Development

What Is Next Generation Sequencing (NGS) And How Is It Used In Drug Development

By: Deepak Kumar, PhD

NGS methodologies have been used to produce high-throughput sequence data. These data with appropriate computational analyses facilitate variant identification and prove to be extremely valuable in pharmaceutical industries and clinical practice for developing drug molecules inhibiting disease progression. Thus, by providing a comprehensive profile of an individual’s variome — particularly that of clinical relevance consisting of pathogenic variants — NGS helps in determining new disease genes. The information thus obtained on genetic variations and the target disease genes can be used by the Pharma companies to develop drugs impeding these variants and their disease-causing effect. However simple this may allude…

How To Determine The Optimal Antibody Concentration For Your Flow Cytometry Experiment (Part 1 of 6)

How To Determine The Optimal Antibody Concentration For Your Flow Cytometry Experiment (Part 1 of 6)

By: Tim Bushnell, PhD

Over the next series of blog posts, we will explore the different aspects of optimizing a polychromatic flow cytometry panel. These steps range from figuring out the best voltage to use, which controls are critical for data interpretation, what quality control tools can be integrated into the assay; how to block cells, and more. This blog will focus on determining the optimal antibody concentration.  As a reminder about the antibody structure, a schematic of an antibody is shown below.  Figure 1: Schematic of an antibody. Figure from Wikipedia. The antibody is composed of two heavy chains and two light chains that…

Structural Variant Calling From NGS Data

Structural Variant Calling From NGS Data

By: Deepak Kumar, PhD

Single Nucleotide Variant (SNVs) have been considered as the main source of genetic variation, therefore precisely identifying these SNVs is a critical part of the Next Generation Sequencing (NGS) workflow. However, in this report from 2004, the authors identified another form of variants called the Structural Variants (SVs), which are genetic alterations of 50 or more base pairs, and result in duplications, deletions, insertions, inversions, and translocations in the genome. The changes in the DNA organization resulting from these SVs have been shown to be responsible for both phenotypic variation and a variety of pathological conditions. While the average variation,…

Essential Concepts in Gene Prediction and Annotation

Essential Concepts in Gene Prediction and Annotation

By: Deepak Kumar, PhD

After genome assembly (covered in my previous blog) comes the vital step of gene prediction and annotation. This step entails the prediction of all the genes present in the assembled genome and to provide efficient functional annotation to these genes from the data available in diverse public repositories; such as Protein Family (PFAM), SuperFamily, Conserved Domain Database (CDD), TIGRFAM, PROSITE, CATH, SCOP, and other protein domain databases. It is imperative to understand that prediction and annotation of non-protein-coding genes, Untranslated Regions (UTR), and tRNA are as vital as protein-coding genes to determine the overall genetic constitution of the assembled genome. …

Brightness Is In The Eye Of The Detector - What To Consider When Designing Your Panel

Brightness Is In The Eye Of The Detector - What To Consider When Designing Your Panel

By: Tim Bushnell, PhD

The heart and soul of the flow cytometry experiment is the ‘panel.’ The unique combinations of antibodies, antigens, fluorochromes, and other reagents are central to identifying the cells of interest and extracting the data necessary to answer the question at hand. Designing the right panel for flow cytometry is essential for detecting different modalities. The more parameters that can be interrogated will yield more information about the target cells. Current instruments can measure as many as 40 different parameters simultaneously. This is exciting, as it allows for more complex questions to be studied. Panel design is also valuable for precious samples,…

7 Key Image Analysis Terms For New Microscopist

7 Key Image Analysis Terms For New Microscopist

By: Heather Brown-Harding, PhD

As scientists, we need to perform image analysis after we’ve acquired images in the microscope, otherwise, we have just a pretty picture and not data. The vocabulary for image processing and analysis can be a little intimidating to those new to the field. Therefore, in this blog, I’m going to break down 7 terms that are key when post-processing of images. 1. RGB Image Images acquired during microscopy can be grouped into two main categories. Either monochrome (that can be multichannel) or “RGB.” RGB stands for red, green, blue – the primary colors of light. The cameras in our phones…

5 Essential Concepts In Genome Assembly From NGS data

5 Essential Concepts In Genome Assembly From NGS data

By: Deepak Kumar, PhD

The main goal for researchers, clinicians, and students who perform Next Generation Sequencing (NGS) and produce sequenced data for diverse projects involving human samples is to find biomarkers or variants to make diagnoses; and deduce the genetic anomalies that could be responsible for the disease they are conducting research on. Most projects (academic or non-academic) constitute the prior ideology on deciphering the “unknown.” There are well-versed computational protocols and pipelines formulated by labs across the world in determining what the “unknown” variants are. The fact that we have the “reference” human genome available – thanks to the Human Genome Project – plays…

The 5 Fundamental Methods For Imaging Nucleic Acids

The 5 Fundamental Methods For Imaging Nucleic Acids

By: Heather Brown-Harding, PhD

There are 4 major ways to sort cells. The first way can use magnetic beads coupled to antibodies and pass the cells through a magnetic field. The labeled cells will stick, and the unlabeled cells will remain in the supernatant. The second way is to use some sort of mechanical force like a flapper or air stream that separates the target cells from the bulk population. The third way is the recently introduced microfluidics sorter, which uses microfluidics channels to isolate the target cells. The last method, which is the most common––based on Fuwyler’s work––is the electrostatic cell sorter. This…

Top Technical Training eBooks

Get the Advanced Microscopy eBook

Get the Advanced Microscopy eBook

Heather Brown-Harding, PhD

Learn the best practices and advanced techniques across the diverse fields of microscopy, including instrumentation, experimental setup, image analysis, figure preparation, and more.

Get The Free Modern Flow Cytometry eBook

Get The Free Modern Flow Cytometry eBook

Tim Bushnell, PhD

Learn the best practices of flow cytometry experimentation, data analysis, figure preparation, antibody panel design, instrumentation and more.

Get The Free 4-10 Compensation eBook

Get The Free 4-10 Compensation eBook

Tim Bushnell, PhD

Advanced 4-10 Color Compensation, Learn strategies for designing advanced antibody compensation panels and how to use your compensation matrix to analyze your experimental data.