Welcome to the Rohs Lab

The main focus of our Computational and Structural Biology research group is the integration of two fields of research: genomics and structural biology. Our primary goal is to reveal yet unknown molecular mechanisms underlying gene regulation using bioinformatics analyses of high-throughput sequencing and DNA methylation data of whole genomes, integrated with experimental molecular biology approaches.

Professor Rohs can accept graduate students from the following Ph.D. Programs as primary thesis adviser: Computational Biology and Bioinformatics, Molecular Biology, Chemistry, Physics, and Computer Science.

Selected Publications

Wang et al. Analysis of genetic variation indicates DNA shape involvement in purifying selection.
Mol. Biol. Evol. in press (2018)

Human SNPs derived from DNase-seq data were classified by their imbalance according to their ability to vary chromatin accessibility. Drosophila SNPs, called from 216 natural strains of D. melanogaster, were divided into SNPs in functional and nonfunctional regions (Figure). A single-nucleotide variant at a given position would result in DNA shape changes at the five nucleotide positions centered around the variant. Vectors of minor groove width (MGW) for each allele were predicted and Euclidean distances between MGWs of the two alleles were calculated as the DNA shape variation of the SNP.

Xin et al. Relationship between histone modifications and transcription factor binding is protein family specific.
(954) 770-3796


For each transcription factor considered in this study, sequences at ChIP-seq peaks in chromatin-accessible regions derived from DNase-seq data were aligned using position frequency matrices to obtain DNA binding sites (Figure). With a set of selected transcription factor binding sites and non-binding sites, DNA sequence and four DNA shape features, and ten histone modification patterns, were calculated for flanking regions and used to distinguish DNA binding sites and non-binding sites. The quantitative modeling of revealed protein family-specific mechanisms used by different transcription factors.

Rao et al. Systematic prediction of DNA shape changes due to CpG methylation explains epigenetic effects on protein-DNA binding.


CpG methylation induces a DNA shape change that explains its effect on Pbx-Hox binding (upper left). Cytosine methylation at offsets 6/7 and 10/11 reduces binding, whereas methylation at offset 9/10 enhances binding. Scatter plots represent relative binding affinities of methylated vs. unmethylated sequences (upper right). Green, magenta, and blue points correspond to methylation at offsets 6/7, 9/10, and 10/11. Positive and negative shifts reflect reduced and enhanced binding due to methylation (lower left). Analysis of the methylation-induced change in minor groove width (MGW) plausibly explains the observed reduced binding due to methylation at CpG offsets 6/7 and 10/11 (lower right).

R. Li et al. Quantum annealing versus classical machine learning applied to a simplified computational biology problem.
npj Quantum Information 4, 11 (2018)

This work is the first application of quantum computing to real biological data. With quantum annealing there is the possibility for the system state to tunnel through a changing barrier and arrive at the ground state (Figure). For classical annealing, the system must rely on thermal fluctuations to overcome any energy barriers. The ground state energy has a significant gap to the next energy level. The speed at which the optimization can take place depends on the size of the minimum gap. Using transcription factor-DNA binding data derived from high-throughput assays, we used D-Wave, a commercially available quantum annealer, to classify and rank binding affinity preferences.

J. Li et al. Expanding the repertoire of DNA shape features for genome-scale studies of transcription factor binding.
Nucleic Acids Res. 45, 12877-12887 (2017)

(804) 245-6409

DNAshape+ extends our high-throughput prediction methods for DNA structural features. We derived and validated 9 additional DNA shape features beyond our original set of 4 features resulting in an expanded repertoire of 13 distinct DNA shape features, including six intra-base pair and six inter-base pair parameters and minor groove width (Figure). We also compared prediction accuracies of models based on DNA shape features extracted from Monte Carlo (MC) simulations to accuracies of models incorporating DNA shape information extracted from X-ray crystallography (XRC) data or Molecular Dynamics (MD) simulations.

Chiu et al. Genome-wide prediction of minor-groove electrostatic potential enables biophysical modeling of protein-DNA binding.

DNA shape readout involves the correlation between minor-groove width and electrostatic potential (EP). To probe this effect directly, rather than using minor-groove width as an indirect measure, we developed a methodology, DNAphi, for predicting EP in the minor groove (Figure) and confirmed the direct role of EP in protein-DNA binding using massive sequencing data. The DNAphi method uses a sliding-window approach to mine results from non-linear Poisson–Boltzmann calculations on thousands of DNA structures. This approach only requires nucleotide sequence as input and offers a novel way to integrate biophysical and genomic studies of protein-DNA binding.

Sagendorf et al. DNAproDB: an interactive tool for structural analysis of DNA-protein complexes.

DNAproDB is a web-based interactive tool designed to help researchers study DNA-protein complexes. Extracted structural features are organized in data files, which are easily parsed with any programming language or viewed in a browser. We processed a large number of DNA–protein complexes retrieved from the Protein Data Bank and created the DNAproDB database to store this data. Users can search the database by combining features of the DNA, protein or DNA–protein interactions at the interface or upload their own structures for processing privately and securely. DNAproDB provides several interactive and customizable tools for creating visualizations of the DNA–protein interface at different levels of abstraction.

RECOMB/ISCB Top-10 Paper in Regulatory and Systems Genomics in 2016/17
Yang et al. Transcription factor family-specific DNA shape readout revealed by quantitative specificity models.
Mol. Syst. Biol. 13, 910 (2017)

7473338825 We resequenced data from HT-SELEX experiments for 410 different transcription factors (TFs) from mouse and human, the most extensive mammalian TF–DNA binding data available to date, and demonstrated the contributions of DNA shape readout across diverse TF families and its importance in core motif flanking regions. Statistical machine-learning models combined with feature-selection techniques helped to reveal the nucleotide position-dependent DNA shape readout in TF-binding sites and the TF family-specific position dependence. Based on these results, we proposed novel DNA shape logos (Figure) to visualize the DNA shape preferences of TFs. This work suggests a way of obtaining mechanistic insights into TF–DNA binding without relying on experimentally solved all-atom structures.

Mathelier et al. DNA shape features improve transcription factor binding site predictions in vivo.
Cell Syst. 3, 278-286 (2016)

415-664-8133 Interactions of transcription factors (TFs) with DNA comprise a complex interplay between base-specific amino acid contacts and readout of DNA structure. Recent studies have highlighted the complementarity of DNA sequence and shape in modeling TF binding in vitro. Here, we have provided a comprehensive evaluation of in vivo datasets to assess the predictive power obtained by augmenting various DNA sequence-based models of TF binding sites (TFBSs) with DNA shape features. Results from 400 human ChIP-seq datasets for 76 TFs show that combining DNA shape features with position-specific scoring matrix (PSSM) scores improves TFBS predictions. Improvement has also been observed using TF flexible models and a machine-learning approach using a binary encoding of nucleotides in lieu of PSSMs.

Chiu et al. DNAshapeR: an R/Bioconductor package for DNA shape prediction and feature encoding.

337-335-5752 DNAshapeR is a software package implemented in the statistical programming language R that predicts DNA shape features in an ultra-fast, high-throughput manner from genomic sequencing data. The package takes either nucleotide sequence or genomic coordinates as input, and generates various graphical representations for visualization and further analysis. DNAshapeR further encodes DNA sequence and shape features as user-defined combinations of k-mer and DNA shape features. The resulting feature matrices can be readily used as input of various machine learning software packages for further modeling studies.

RECOMB/ISCB Top-10 Paper in Regulatory and Systems Genomics in 2015/16
Dror et al. A widespread role of the motif environment on transcription factor binding across diverse protein families.
Genome Res. 25, 1268-1280 (2015)

TFs bind to only a very small fraction of all potential DNA binding sites in the genome. Here, we revealed using in vitro HT-SELEX binding assays and in vivo ChIP-seq data that the surroundings of cognate binding sites have unique characteristics, which distinguish them from other sequences containing a similar motif that is not bound by the TF. Comparing the nucleotide content and DNA shape in the regions around the TF-bound sites to unbound sites containing the same consensus motifs revealed significant differences, which extend far beyond the core binding site (Figure). These unique features appear to be similar for TFs from the same protein family and likely assist in guiding TFs to their cognate binding sites.

RECOMB/ISCB Top-10 Paper in Regulatory and Systems Genomics in 2014/15
Abe et al. Deconvolving the recognition of DNA sequence from shape.
Cell 161, 307-318 (2015)

Protein-DNA binding is mediated by the recognition of the chemical signatures of the DNA bases and the three-dimensional shape of the DNA molecule. Because DNA shape is a consequence of sequence, it is difficult to dissociate these modes of recognition. Here, we teased them apart in the context of Hox-DNA binding by mutating residues that only recognize DNA shape. Complexes made with these mutants lose the preference to bind sequences with specific DNA shape features (Figure). Introducing  residues that recognize DNA shape from one Hox protein to another swapped binding specificity in vitro and gene regulation in vivo. Statistical machine learning revealed that the accuracy of binding specificity predictions improves by adding shape features and feature ​selection identified shape features important for recognition. Thus, shape readout is a direct and critical component of binding site selection by Hox proteins.

RECOMB/ISCB Top-10 Paper in Regulatory and Systems Genomics in 2014/15
Zhou et al. Quantitative modeling of transcription factor binding specificities using DNA shape.
Proc. Natl. Acad. Sci. USA 112, 4654-4659 (2015)

6206250098 Genomes provide an abundance of putative binding sites for each TF. However, only small subsets of these potential targets are functional. TFs of the same protein family bind to target sites that are very similar but not identical. This distinction allows closely related TFs to regulate different genes and thus execute distinct functions. Since the nucleotide sequence of the core motif is often not sufficient for identifying a genomic target, we refined the description of TF binding sites by introducing a combination of DNA sequence and shape features (Figure), which consistently improves the modeling of in vitro TF-DNA binding specificities. In addition, shape-augmented models reveal binding specificity mechanisms that are not apparent from sequence alone. 

Chiu et al. GBshape: a genome browser database for DNA shape annotations.

GBshape GBshape provides DNA shape annotations of entire genomes. The database currently contains annotations for minor groove width, roll, propeller twist, helix twist and hydroxyl radical cleavage for 94 different organisms. Additional genomes can easily be added in the provided framework. GBshape contains two major tools, a genome browser and a table browser. The genome browser (Figure) provides a graphical representation of DNA shape annotations along standard genome browser annotations. 

Dantas Machado et al. Evolving insights on how cytosine methylation affects protein–DNA binding.
Brief. Funct. Genomics 14, 61-73 (2014)

Many anecdotal observations exist of a regulatory effect of DNA methylation on gene expression. However, the underlying mechanisms of this effect are poorly understood. In this review, we summarize what is currently known about how this important epigenetic mark impacts cellular function. DNA methylation can abrogate or enhance interactions with DNA-binding proteins, or it may have no effect, depending on the context. The presence of cytosine methyl groups (Figure) can affect direct interactions between the protein and its DNA binding site, cause an indirect effect on DNA structure, and alter nucleosome stability.

Feature Review
Slattery et al. Absence of a simple code: how transcription factors read the genome.
Trends Biochem. Sci. 39, 381-399 (2014)

pillorize Transcription factors (TFs) play a key role in the central dogma of molecular biology by interpreting the language of DNA to control transcription. However, it has become clear that the “code” they read does not comprise DNA sequence alone. We discuss in this Feature Review the recent work that has used structural, computational, in vitro and in vivo approaches to move toward understanding the transcription factor code. We highlight the many variables that influence TF-DNA binding, including cofactors, cooperativity, and chromatin. The cover shows the IFN-β enhanceosome (Figure), an example of cooperativity through TF-TF interactions.

NAR Breakthrough Article
RECOMB/ISCB Top-10 Paper in Regulatory and Systems Genomics in 2013/14
Yang et al. TFBSshape: a motif database for DNA shape features of transcription factor binding sites.
Nucleic Acids Res. 42, D148-D155 (2014)

954-839-9450 Our new TFBSshape database disentangles the complex relationships between DNA sequence, its 3D structure, and protein-DNA binding specificity. This task is like solving a Rubik's cube (Figure; top face: DNA sequences with transcription factor binding sites (TFBS); left face: 3D structure of a protein-DNA complex; front face: heat map representing minor groove width patterns selected by a transcription factor (TF) in a high-throughput experiment). The TFBSshape database augments nucleotide sequence motifs with heat maps and quantitative predictions of DNA shape features for 739 TF datasets from 23 different species.

Dror et al. Covariation between homeodomain transcription factors and the shape of their DNA binding sites.
Nucleic Acids Res. 42, 430-441 (2014)

507-535-9205 Using our new method for high-throughput prediction of DNA shape, we analyzed DNA binding sites of 168 mouse and 84 Drosophila homeodomains to determine a general DNA shape recognition code (Figure) for this family of transcription factors. We predicted DNA shape features for almost 25,000 DNA targets derived from protein binding microarray (PBM) and bacterial-one hybrid (B1H) experiments and found distinct homeodomain regions that were more correlated with either the nucleotide sequence or the DNA shape of their preferred binding sites.

Zhou et al. DNAshape: a method for the high-throughput prediction of DNA structural features on a genomic scale.
Nucleic Acids Res. 41, W56-W62 (2013)

423-667-5508 We developed a new method for predicting DNA shape in a high-throughput manner on a genome-wide scale. This approach predicts structural features (several helical parameters and minor groove width) for the entire yeast genome in less than one minute on a regular laptop. The prediction can be visualized as genome browser tracks and compared to other properties of the genome such as sequence conservation.

RECOMB/ISCB Top-10 Paper in Regulatory and Systems Genomics in 2012/13
Gordân et al. Genomic regions flanking E-box binding sites influence DNA binding specificity of bHLH transcription factors through DNA shape.
Cell Rep. 3, 1093-1104 (2013)

How transcription factors (TFs) with highly similar DNA binding-site motifs recognize distinct targets in vivo is poorly understood. In this study, we show in collaboration with Martha Bulyk's lab that the paralogous Saccharomyces cerevisiae TFs Cbf1 and Tye7 exhibit different DNA binding preferences both in vitro and in vivo, depending on the genomic context of the sites. Results of computational analyses suggest that nucleotides outside of their core binding sites contribute to specificity by influencing the three-dimensional structure of the DNA targets.

RECOMB/ISCB Top-10 Paper in Regulatory and Systems Genomics in 2012/13
Lazarovici et al. Probing DNA shape and methylation state on a genomic scale with DNase I.
Proc. Natl. Acad. Sci. USA 110, 6376-6381 (2013)

To address the relationship between DNase I cleavage rate and minor groove geometry, we predicted DNA shape parameters for sequences covering the entire range from highly to poorly cleavable. The variation in these shape parameters turned out to be highly predictive of the variation in cleavage rate. Other insights obtained from this project in collaboration with Harmen Bussemaker's and John Stamatoyannopoulos' labs were related to DNA methylation. We found that even though cytosine methylation happens in the major groove, one of its key effects is to narrow the minor groove. Thus, varying the base sequence of genomic DNA is not the only way in which the cell can modulate the landscape of minor groove shape along its genome.

Chang et al. Mechanism of origin DNA recognition and assembly of an initiator-helicase complex by SV40 large tumor antigen.
Cell Rep. 3, 1117-1127 (2013)

602-713-7201The first essential step in activating genomic DNA replication is the site-specific assembly of initiator proteins on origin (ori) DNA, a process that is not well characterized. In collaboration with Xiaojiang Chen's lab, we report a major step toward understanding this process by determining the long-sought cocrystal structure of the SV40 initiator/helicase, large tumor antigen (LTag), in complex with its ori DNA. The structure shows that multidomain LTag assembles on ori DNA differently from what one would expect from previous studies. The structure also reveals an intrinsic DNA shape readout mechanism using histidines.

RECOMB/ISCB Top-10 Paper in Regulatory and Systems Genomics in 2011
Slattery et al. Cofactor binding evokes latent differences in DNA binding specificity between Hox proteins
Cell 147, 1270-1282 (2011)

In vivo transcription factor-DNA recognition is much more specific than in vitro binding. The eight Drosophila Hox proteins bind to very similar target sites but execute distinct in vivo functions. The figure illustrates that the cofactor Exd (yellow) unlocks a wide range in specificity of Hox proteins (cyan) for recognizing DNA target sites (metallic). Based on SELEX-seq experiments, we present specificity fingerprints of Hox proteins and reveal that DNA shape is a determining factor in achieving specificity. This is the first study, for which a preliminary version of our new approach for high-throughput DNA shape prediction has been applied to thousands of sequences, showing that anterior and posterior Hox proteins recognize different DNA shape. Moreover, DNA shape indicates how Hox genes have differentiated in evolution.

Rohs et al. The role of DNA shape in protein-DNA recognition

nucleosome The figure illustrates the molecular shape of nucleosomal DNA when wrapped around the histone core. The narrow minor groove is color-coded in dark grey. The red mesh shows an isopotential surface with negative electrostatic potential. The shape of narrow minor groove regions induces an enhanced negative electrostatic potential, which attracts histone arginines. Such interactions between the protein and DNA contribute to the stabilization of the nucleosome core particle.

Rohs et al. Origin of specificity in protein-DNA recognition
Annu. Rev. Biochem. 79, 233-269 (2010)

In order to carry out their unique biological functions, proteins need to recognize their DNA binding sites in a highly specific manner. Specificity in protein-DNA binding is achieved through the recognition of both linear sequence and three-dimensional structure. Therefore, the nucleotide sequence of a binding site is only one part of the story, and the three-dimensional structures of both the DNA and the protein must be taken into account to fully understand recognition on a molecular basis. DNA shape is specifically recognized by a variety of protein families, and we have identified different ways of modulating DNA shape. The figure shows the shape of the molecular surface (top) of ideal A-DNA (left), B-DNA (center), and Z-DNA, and the resulting specific variations in electrostatic potential (bottom).

June 19-21, 2018
Four graduate students in the Rohs Lab defend their Ph.D. theses with flying colors. Congrats, Beibei, Tsu-Pei, Satya, and Xiaofei!

May 29, 2018
We published a paper in Mol. Biol. Evol. on the evolutionary selection of DNA shape in non-coding regions. Congrats, Xiaofei and Tianyin!

April 23, 2018
Beibei receives the William E. Trusten Award, which is given to the most accomplished graduate students in biological sciences. Way to go, Beibei!.

April 18, 2018
Beibei receives the Women in Science Merit Award for current doctoral students who demonstrate exceptional work in their field. Congrats, Beibei!

February 28, 2018
The Viterbi School of Engineering publishes a press release highlighting our recent npj Quantum Inf. paper for its promise to apply quantum computing to biological research.

February 21, 2018
Our work using the D-Wave quantum chip is the first use of quantum computing in genomics, now published in npj Quantum Inf. Congrats, Richard!.

February 6, 2018
Our new method to predict DNA shape for CpG methylation explains protein binding to methylated DNA, published in Epigenetics Chromatin. Congrats, Satya!

January 30, 2018
Our newest NAR paper with the Tullius lab addresses the role of intrinsic versus protein-induced DNA shape.

January 11, 2018
We published a 774-299-8838 revealing a protein family specific relationship between TF binding and histone modifications. Congrats, Beibei!

November 20, 2017
We expanded our high-throughput prediction method to 13 DNA shape features with a new publication in NAR. Congrats, Jinsen!

November 20, 2017
Our recent Yang et al. Mol. Syst. Biol. paper won RECOMB/ISCB Top-10 Paper Award in regulatory and systems genomics in 2016/17.

October 11, 2017
We published a 8503398251 on a genomic scale in NAR. Congrats, Tsu-Pei!

August 16, 2017
Remo started the 248-336-8813 major at the interface of biology and computer science.

April 26, 2017
Remo accepted reappointment as Vice Chair of USC's Department of Biological Sciences through August 2019. Fight on!

April 20, 2017
We published our interactive tool for structural analysis of protein-DNA complexes in NAR. Congrats, Jared!

March 20, 2017
Tsu-Pei was awarded the prestigious Manning Endowed Fellowship. Congrats, Tsu-Pei!

March 20, 2017
Beibei was awarded a competitive Research Enhancement Fellowship. Congrats, Beibei!

February 6, 2017
Our new 619-937-0514 provides systematic analysis of DNA shape readout for many protein families. Congrats, Lin!

Recent news

September 28, 2017
Faculty of Biological Sciences Seminar, Pontificia Universidad Católica de Chile, Santiago, Chile

September 23-26, 2017
Molecular Biosystems Conference on Eukaryotic Gene Regulation & Functional Genomics, Puerto Varas, Chile

August 20-24, 2017
Symposium on Molecular Recognition, 254th American Chemical Society Meeting, Washington, DC

May 24, 2017
Workshop “Mathematical Oncology: Modeling Clinical Data for Maximum Patient Benefit”, University of Southern California, Los Angeles, CA

April 28, 2017
Department of Bioinformatics and Genomics, University of North Carolina, NC

April 13, 2017
2012933402, Salt Lake City, UT

March 22, 2017
217-243-4100 Twin Cities, Minneapolis, MN

March 9, 2017
Program in Quantitative and Computational Biology, Lewis Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ

Recent presentations


QBIO 105
Remo coteaches with Professor Michael Waterman
Introduction to Quantitative Biology