How To Profile DNA And RNA Expression Using Next Generation Sequencing (Part-2)

In the first blog of this series, we explored the power of sequencing the genome at various levels. We also dealt with how the characterization of the RNA expression levels helps us to understand the changes at the genome level. These changes impact the downstream expression of the target genes. In this blog, we will explore how NGS sequencing can help us comprehend DNA modification that affect the expression pattern of the given genes (epigenetic profiling) as well as characterizing the DNA-protein interactions that allow for the identification of genes that may be regulated by a given protein. 

DNA Methylation Profiling or Epigenetic Profiling

NGS can be adapted to profile DNA methylation either through an enrichment (using methyl CpG antibody or methyl-CpG-binding protein) or by bisulfite sequencing.

Figure 1: Different methods of NGS DNA Methylation profiling.

1. Bisulfite Sequencing

Bisulfite treatment of DNA converts unmethylated cytosines to uracil, while methylated cytosines remain the same. Uracil bases are then identified as thymine in the sequencing data, which could be used to identify the location and percentage of methylated cytosines. NGS-based bisulfite sequencing — whether whole-genome or targeted — makes it possible to profile genome-wide cytosine methylation at single-base resolution.

Types of Bisulfites sequencing:

a. Whole-Genome Bisulfite Sequencing (WGBS)

Currently, WGBS is the most comprehensive way to profile DNA methylation at base-pair resolution. However, the required depth (minimum 30x) makes it cost-prohibitive. Thus, other enrichment methods have been devised to reduce the cost of methylation profiling, especially when 100% coverage or base-pair resolution is not necessary.

b. Reduced Representation Bisulfite Sequencing (RRBS)

RRBS relies on restriction enzymes such as MspI (CCGG) or BglII (AGATCT), which tend to cut inside or near CpG islands and promoter regions regardless of methylation status. Subsequently, fragments between 40 – 220 bp are isolated and end-repaired, then treated with bisulfite and amplified with PCR. RRBS using MspI captures approximately 80% of CpG islands and 60% of promoter regions in human genomes.

2. Methylated DNA-enriched Sequencing

a. MethyCap-Seq

This sequencing uses the Methyl-CpG-binding (MBD) domain of MeCP2 to capture methylated DNA on magnetic beads. After the captured DNA is enriched with magnetic capture, the bound DNA is eluted with a high-salt solution and then used for NGS. While this is a cost-effective method, the current resolution is ~150 bp, so it is suitable for fast, large-scale, and low-resolution studies.

b. Methylated DNA Immunoprecipitation-Seq (MeDIP-Seq)

It uses an anti-methylcytosine antibody to immunoprecipitate DNA with methyl CpG. While MeDIP-Seq can be relatively inexpensive, it can yield resolutions of between 100 – 300 bp.

DNA-protein Interaction Profiling

Due to the quantitative nature of NGS, chromatin immunoprecipitation-enriched DNA can be sequenced with NGS to profile any genomic regions bound by the proteins of interest that can either be recognized with an antibody or tagged with an epitope. These include DNA-binding proteins, transcription factors, histones, histone variants, specific histone modifications, and nucleosomes.

1. ChIP-Seq (Chromatin Immunoprecipitation Sequencing)

To create a ChIP enriched library, DNA-bound proteins are cross-linked to DNA using formaldehyde, before the chromatin is cleaved. The sample is then enriched using immunoprecipitation with an antibody specific to the protein or protein modification of interest. Subsequently, the crosslinks are reversed, and then the ChIP enriched library can be assayed using quantitative PCR, microarray, or NGS. 

Difference between ChIP-chip Vs. ChIP-Seq

ChIP-chip resolution is limited by the probes’ fragment sizes on the arrays, whereas ChIP-Seq can provide single-nucleotide resolution. ChIP-Seq requires much less input DNA and provides signals with an unlimited dynamic range, depending on the sequencing depth. Additionally, ChIP-Seq makes it possible to profile repetitive regions – these are often omitted from the microarrays. Repetitive regions that are often important for epigenetic control, such as heterochromatin or microsatellites, may only be mapped with NGS.

In addition to identifying genomic regions bound by the proteins, ChIP-Seq can provide insights into the functions of the DNA-bound proteins themselves. For example, ChIP-Seq data can be used to identify the cognate binding motifs of the DNA-binding proteins. This sequence data can also be used to globally infer distances between the binding sites and genomic features, such as transcription start sites, exon-intron boundaries, 3’end of genes, and from other known binding sites.

Figure 2:  A representation of Chip sequencing

  1. Micrococcal Nuclease-Seq (MNase-Seq)

Nucleosome occupancy can tell us about regions of active genes and chromatin structure in eukaryotes. NGS allows us to profile the nucleosome occupancy by sequencing the micrococcal nuclease (MNase)-digested genomic DNA. MNase prefers to digest linker DNA between histone octamers unoccupied by other proteins.

Figure 3: The workflow of an MNase protection assay

DNA is crosslinked to the protein using formaldehyde before MNase digestion. Once the digestion step is complete, the crosslinks are reversed. Then, the digested DNA is run on a gel to select the desired digested products, which are then purified and subsequently used for NGS. To control for MNase sequence bias, GC/AT preference, and other technical biases, it is necessary to concurrently sequence the genomic DNA from the same sample without crosslinking – and compare them during the analysis process.

Concluding Remarks

Over the course of these two blog posts, we have explored the power of NGS sequencing at several levels, from whole-genome sequencing, down to characterizing epigenetic differences that impact gene expression. NGS sequencing allows scientists to get a deeper holistic understanding of the genome, and variations that may be markers for the disease.  No other technique can provide such a complete picture in a relatively short time frame. As costs continue to decrease, these techniques will continue to have a greater role in areas such as drug discovery, clinical diagnostics, and ultimately personalized medicine.  Stay tuned to this blog for more information on these and many other techniques being developed in the world of NGS sequencing.   

To learn more about gene prediction and how NGS can assist you, and to get access to all of our advanced materials including 20 training videos, presentations, workbooks, and private group membership, get on the Expert Sequencing wait list.

Join Expert Cytometry's Mastery Class
Deepak Kumar, PhD
Deepak Kumar, PhD Genomics Software Application Engineer

Deepak Kumar is a Genomics Software Application Engineer (Bioinformatics) at Agilent Technologies. He is the founder of the Expert Sequencing Program (ExSeq) at Cheeky Scientist. The ExSeq program provides a holistic understanding of the Next Generation Sequencing (NGS) field - its intricate concepts, and insights on sequenced data computational analyses. He holds diverse professional experience in Bioinformatics and computational biology and is always keen on formulating computational solutions to biological problems.

Similar Articles

How To Optimize Instrument Voltage For Flow Cytometry Experiments  (Part 3 Of 6)

How To Optimize Instrument Voltage For Flow Cytometry Experiments (Part 3 Of 6)

By: Tim Bushnell, PhD

As we continue to explore the steps involved in optimizing a flow cytometry experiment, we turn our attention to the detectors and optimizing sensitivity: instrument voltage optimization.  This is important as we want to ensure that we can make as sensitive a measurement as possible.  This requires us to know the optimal sensitivity of our instrument, and how our stained cells are resolved based on that voltage.  Let’s start by asking the question what makes a good voltage?  Joe Trotter, from the BD Biosciences Advanced Technology Group, once suggested the following:  Electronic noise effects resolution sensitivity   A good minimal PMT…

How To Profile DNA And RNA Expression Using Next Generation Sequencing

How To Profile DNA And RNA Expression Using Next Generation Sequencing

By: Deepak Kumar, PhD

Why is Next Generation Sequencing so powerful to explore and answer both clinical and research questions. With the ability to sequence whole genomes, identifying novel changes between individuals, to exploring what RNA sequences are being expressed, or to examine DNA modifications and protein-DNA interactions occurring that can help researchers better understand the complex regulation of transcription. This, in turn, allows them to characterize changes during different disease states, which can suggest a way to treat said disease.  Over the next two blogs, I will highlight these different methods along with illustrating how these can help clinical diagnostics as well as…

Optimizing Flow Cytometry Experiments - Part 2         How To Block Samples (Sample Blocking)

Optimizing Flow Cytometry Experiments - Part 2 How To Block Samples (Sample Blocking)

By: Tim Bushnell, PhD

In my previous blog on  experimental optimization, we discussed the idea of identifying the best antibody concentration for staining the cells. We did this through a process called titration, which  focuses on finding the best signal-to-noise ratio at the lowest antibody concentration. In this blog we will deal with sample blocking As a reminder, there are two other major binding concerns with antibodies. The first is the specific binding of the Fc fragment of the antibody to the Fc Receptor expressed on some cells. This protein is critical for the process of destroying microbes or other cells that have been…

What Is Next Generation Sequencing (NGS) And How Is It Used In Drug Development

What Is Next Generation Sequencing (NGS) And How Is It Used In Drug Development

By: Deepak Kumar, PhD

NGS methodologies have been used to produce high-throughput sequence data. These data with appropriate computational analyses facilitate variant identification and prove to be extremely valuable in pharmaceutical industries and clinical practice for developing drug molecules inhibiting disease progression. Thus, by providing a comprehensive profile of an individual’s variome — particularly that of clinical relevance consisting of pathogenic variants — NGS helps in determining new disease genes. The information thus obtained on genetic variations and the target disease genes can be used by the Pharma companies to develop drugs impeding these variants and their disease-causing effect. However simple this may allude…

How To Determine The Optimal Antibody Concentration For Your Flow Cytometry Experiment (Part 1 of 6)

How To Determine The Optimal Antibody Concentration For Your Flow Cytometry Experiment (Part 1 of 6)

By: Tim Bushnell, PhD

Over the next series of blog posts, we will explore the different aspects of optimizing a polychromatic flow cytometry panel. These steps range from figuring out the best voltage to use, which controls are critical for data interpretation, what quality control tools can be integrated into the assay; how to block cells, and more. This blog will focus on determining the optimal antibody concentration.  As a reminder about the antibody structure, a schematic of an antibody is shown below.  Figure 1: Schematic of an antibody. Figure from Wikipedia. The antibody is composed of two heavy chains and two light chains that…

Structural Variant Calling From NGS Data

Structural Variant Calling From NGS Data

By: Deepak Kumar, PhD

Single Nucleotide Variant (SNVs) have been considered as the main source of genetic variation, therefore precisely identifying these SNVs is a critical part of the Next Generation Sequencing (NGS) workflow. However, in this report from 2004, the authors identified another form of variants called the Structural Variants (SVs), which are genetic alterations of 50 or more base pairs, and result in duplications, deletions, insertions, inversions, and translocations in the genome. The changes in the DNA organization resulting from these SVs have been shown to be responsible for both phenotypic variation and a variety of pathological conditions. While the average variation,…

Essential Concepts in Gene Prediction and Annotation

Essential Concepts in Gene Prediction and Annotation

By: Deepak Kumar, PhD

After genome assembly (covered in my previous blog) comes the vital step of gene prediction and annotation. This step entails the prediction of all the genes present in the assembled genome and to provide efficient functional annotation to these genes from the data available in diverse public repositories; such as Protein Family (PFAM), SuperFamily, Conserved Domain Database (CDD), TIGRFAM, PROSITE, CATH, SCOP, and other protein domain databases. It is imperative to understand that prediction and annotation of non-protein-coding genes, Untranslated Regions (UTR), and tRNA are as vital as protein-coding genes to determine the overall genetic constitution of the assembled genome. …

Brightness Is In The Eye Of The Detector - What To Consider When Designing Your Panel

Brightness Is In The Eye Of The Detector - What To Consider When Designing Your Panel

By: Tim Bushnell, PhD

The heart and soul of the flow cytometry experiment is the ‘panel.’ The unique combinations of antibodies, antigens, fluorochromes, and other reagents are central to identifying the cells of interest and extracting the data necessary to answer the question at hand. Designing the right panel for flow cytometry is essential for detecting different modalities. The more parameters that can be interrogated will yield more information about the target cells. Current instruments can measure as many as 40 different parameters simultaneously. This is exciting, as it allows for more complex questions to be studied. Panel design is also valuable for precious samples,…

7 Key Image Analysis Terms For New Microscopist

7 Key Image Analysis Terms For New Microscopist

By: Heather Brown-Harding, PhD

As scientists, we need to perform image analysis after we’ve acquired images in the microscope, otherwise, we have just a pretty picture and not data. The vocabulary for image processing and analysis can be a little intimidating to those new to the field. Therefore, in this blog, I’m going to break down 7 terms that are key when post-processing of images. 1. RGB Image Images acquired during microscopy can be grouped into two main categories. Either monochrome (that can be multichannel) or “RGB.” RGB stands for red, green, blue – the primary colors of light. The cameras in our phones…

Top Technical Training eBooks

Get the Advanced Microscopy eBook

Get the Advanced Microscopy eBook

Heather Brown-Harding, PhD

Learn the best practices and advanced techniques across the diverse fields of microscopy, including instrumentation, experimental setup, image analysis, figure preparation, and more.

Get The Free Modern Flow Cytometry eBook

Get The Free Modern Flow Cytometry eBook

Tim Bushnell, PhD

Learn the best practices of flow cytometry experimentation, data analysis, figure preparation, antibody panel design, instrumentation and more.

Get The Free 4-10 Compensation eBook

Get The Free 4-10 Compensation eBook

Tim Bushnell, PhD

Advanced 4-10 Color Compensation, Learn strategies for designing advanced antibody compensation panels and how to use your compensation matrix to analyze your experimental data.