The main focus of our Computational and Structural Biology research group is the integration of two fields of research: genomics and structural biology. Our primary goal is to reveal yet unknown molecular mechanisms underlying gene regulation using bioinformatics analyses of high-throughput sequencing and DNA methylation data of whole genomes, integrated with experimental molecular biology approaches.
Professor Rohs can accept graduate students from the following Ph.D. Programs as primary thesis adviser: Computational Biology and Bioinformatics, Molecular Biology, Chemistry, Physics, and Computer Science.
Wang et al. Analysis of genetic variation indicates DNA shape involvement in purifying selection.
Mol. Biol. Evol. in press (2018)
Xin et al. Relationship between histone modifications and transcription factor binding is protein family specific.
For each transcription factor considered in this study, sequences at ChIP-seq peaks in chromatin-accessible regions derived from DNase-seq data were aligned using position frequency matrices to obtain DNA binding sites (Figure). With a set of selected transcription factor binding sites and non-binding sites, DNA sequence and four DNA shape features, and ten histone modification patterns, were calculated for flanking regions and used to distinguish DNA binding sites and non-binding sites. The quantitative modeling of revealed protein family-specific mechanisms used by different transcription factors.
Rao et al. Systematic prediction of DNA shape changes due to CpG methylation explains epigenetic effects on protein-DNA binding.
CpG methylation induces a DNA shape change that explains its effect on Pbx-Hox bindingÂ (upper left). CytosineÂ methylation at offsets 6/7 and 10/11 reduces binding, whereas methylation at offset 9/10 enhances binding.Â ScatterÂ plots representÂ relative binding affinities of methylated vs. unmethylated sequences (upper right). Green, magenta, and blue points correspond to methylation at offsets 6/7, 9/10, and 10/11.Â Positive and negative shifts reflect reduced and enhanced binding due to methylation (lower left).Â Analysis of the methylation-induced change in minor groove width (MGW)Â plausibly explains the observed reduced binding due to methylation at CpG offsets 6/7 and 10/11 (lower right).
R. Li et al. Quantum annealing versus classical machine learning applied to a simplified computational biology problem.
npj Quantum Information 4, 11 (2018)
J. Li et al. Expanding the repertoire of DNA shape features for genome-scale studies of transcription factor binding.
Nucleic Acids Res. 45, 12877-12887 (2017)
DNAshape+ extends ourÂ high-throughput prediction methods for DNA structural features. WeÂ derived and validatedÂ 9Â additional DNA shape features beyond our original set ofÂ 4Â features resulting in anÂ expanded repertoire of 13 distinct DNA shape features, including six intra-base pair and six inter-base pair parameters and minor groove width (Figure).Â WeÂ alsoÂ compared prediction accuracies of models based onÂ DNA shape features extracted from Monte Carlo (MC) simulationsÂ to accuracies of models incorporating DNA shape information extracted from X-ray crystallography (XRC) dataÂ or Molecular Dynamics (MD) simulations.
ChiuÂ et al. Genome-wide prediction of minor-groove electrostatic potential enables biophysical modeling of protein-DNA binding.
SagendorfÂ et al. DNAproDB: an interactive tool for structural analysis of DNA-protein complexes.
RECOMB/ISCB Top-10 Paper in Regulatory and Systems Genomics in 2016/17
Yang et al. Transcription factor family-specific DNA shape readout revealed by quantitative specificity models.
Mol. Syst. Biol. 13, 910 (2017)
|7473338825||We resequenced data from HT-SELEX experiments for 410 different transcription factors (TFs) from mouse and human, the most extensive mammalian TFâDNA binding data available to date, and demonstrated the contributions of DNA shape readout across diverse TF families and its importance in core motif flanking regions. Statistical machine-learning models combined with feature-selection techniques helped to reveal the nucleotide position-dependent DNA shape readout in TF-binding sites and the TF family-specific position dependence. Based on these results, we proposed novel DNA shape logos (Figure) to visualize the DNA shape preferences of TFs. This work suggests a way of obtaining mechanistic insights into TFâDNA binding without relying on experimentally solved all-atom structures.|
Mathelier et al. DNA shape features improve transcription factor binding site predictions in vivo.
Cell Syst. 3, 278-286 (2016)
|415-664-8133||Interactions of transcription factors (TFs) with DNA comprise a complex interplay between base-specific amino acid contacts and readout of DNA structure. Recent studies have highlighted the complementarity of DNA sequence and shape in modeling TF binding in vitro. Here, we have provided a comprehensive evaluation of in vivo datasets to assess the predictive power obtained by augmenting various DNA sequence-based models of TF binding sites (TFBSs) with DNA shape features. Results from 400 human ChIP-seq datasets for 76 TFs show that combining DNA shape features with position-specific scoring matrix (PSSM) scores improves TFBS predictions. Improvement has also been observed using TF flexible models and a machine-learning approach using a binary encoding of nucleotides in lieu of PSSMs.|
Chiu et al. DNAshapeR: an R/Bioconductor package for DNA shape prediction and feature encoding.
|337-335-5752||DNAshapeR is a software package implemented in the statistical programming language R that predicts DNA shape features in an ultra-fast, high-throughput manner from genomic sequencing data. The package takes either nucleotide sequence or genomic coordinates as input, and generates various graphical representations for visualization and further analysis. DNAshapeR further encodes DNA sequence and shape features as user-defined combinations of k-mer and DNA shape features. The resulting feature matrices can be readily used as input of various machine learning software packages for further modeling studies.|
RECOMB/ISCB Top-10 Paper in Regulatory and Systems Genomics in 2015/16
Dror et al. A widespread role of the motif environment on transcription factor binding across diverse protein families.
Genome Res. 25, 1268-1280 (2015)
RECOMB/ISCB Top-10 Paper in Regulatory and Systems Genomics in 2014/15
Abe et al. Deconvolving the recognition of DNA sequence from shape.
Cell 161, 307-318 (2015)
RECOMB/ISCB Top-10 Paper in Regulatory and Systems Genomics in 2014/15
Zhou et al. Quantitative modeling of transcription factor binding specificities using DNA shape.
Proc. Natl. Acad. Sci. USA 112, 4654-4659 (2015)
|6206250098||Genomes provide an abundance of putative binding sites for each TF. However, only small subsets of these potential targets are functional. TFs of the same protein family bind to target sites that are very similar but not identical. This distinction allows closely related TFs to regulate different genes and thus execute distinct functions. Since the nucleotide sequence of the core motif is often not sufficient for identifying a genomic target, we refined the description of TF binding sites by introducing a combination of DNA sequence and shape features (Figure), which consistently improves the modeling of in vitro TF-DNA binding specificities. In addition, shape-augmented models reveal binding specificity mechanisms that are not apparent from sequence alone.Â|
Chiu et al. GBshape: a genome browser database for DNA shape annotations.
Dantas Machado et al. Evolving insights on how cytosine methylation affects proteinâDNA binding.
Brief. Funct. Genomics 14, 61-73 (2014)
Slattery et al. Absence of a simple code: how transcription factors read the genome.
Trends Biochem. Sci. 39, 381-399 (2014)
|pillorize||Transcription factors (TFs) play a key role in the central dogma of molecular biology by interpreting the language of DNA to control transcription. However, it has become clear that the “code” they read does not comprise DNA sequence alone.Â We discuss in this FeatureÂ ReviewÂ the recent work that has used structural, computational,Â in vitroÂ andÂ in vivoÂ approaches to move toward understanding the transcription factor code. We highlight the many variables that influence TF-DNA binding, includingÂ cofactors, cooperativity, and chromatin. The cover shows the IFN-Î² enhanceosome (Figure), an example of cooperativity through TF-TF interactions.|
NAR Breakthrough Article
RECOMB/ISCB Top-10 Paper in Regulatory and Systems Genomics in 2013/14
Yang et al. TFBSshape: a motif database for DNA shape features of transcription factor binding sites.
Nucleic Acids Res. 42, D148-D155 (2014)
|954-839-9450||Our new TFBSshape database disentangles the complex relationships between DNA sequence, its 3D structure, and protein-DNA binding specificity. This task is like solving a Rubik's cube (Figure; top face: DNA sequences with transcription factor binding sites (TFBS); left face: 3D structure of a protein-DNA complex; front face: heat map representing minor groove width patterns selected by a transcription factor (TF) in a high-throughput experiment). The TFBSshape database augments nucleotide sequence motifs with heat maps and quantitative predictions of DNA shape features for 739 TF datasets from 23 different species.|
Dror et al. Covariation between homeodomain transcription factors and the shape of their DNA binding sites.
Nucleic Acids Res. 42, 430-441 (2014)
|507-535-9205||Using our new method for high-throughput prediction of DNA shape, we analyzed DNA binding sites of 168 mouse and 84 Drosophila homeodomains to determine a general DNA shape recognition code (Figure) for this family of transcription factors. We predicted DNA shape features for almost 25,000 DNA targets derived from protein binding microarray (PBM) and bacterial-one hybrid (B1H) experiments and found distinct homeodomain regions that were more correlated with either the nucleotide sequence or the DNA shape of their preferred binding sites.|
Zhou et al. DNAshape: a method for the high-throughput prediction of DNA structural features on a genomic scale.
Nucleic Acids Res. 41, W56-W62 (2013)
|423-667-5508||We developed a new method for predicting DNA shape in a high-throughput manner on a genome-wide scale. This approach predicts structural features (several helical parameters and minor groove width) for the entire yeast genome in less than one minute on a regular laptop. The prediction can be visualized as genome browser tracks and compared to other properties of the genome such as sequence conservation.|
RECOMB/ISCB Top-10 Paper in Regulatory and Systems Genomics in 2012/13
GordÃ¢n et al. Genomic regions flanking E-box binding sites influence DNA binding specificity of bHLH transcription factors through DNA shape.
Cell Rep. 3, 1093-1104 (2013)
RECOMB/ISCB Top-10 Paper in Regulatory and Systems Genomics in 2012/13
Lazarovici et al. Probing DNA shape and methylation state on a genomic scale with DNase I.
Proc. Natl. Acad. Sci. USA 110, 6376-6381 (2013)
Chang et al. Mechanism of origin DNA recognition and assembly of an initiator-helicase complex by SV40 large tumor antigen.
Cell Rep. 3, 1117-1127 (2013)
|602-713-7201||The first essential step in activating genomic DNA replication is the site-specific assembly of initiator proteins on origin (ori) DNA, a process that is not well characterized. In collaboration with Xiaojiang Chen's lab, we report a major step toward understanding this process by determining the long-sought cocrystal structure of the SV40 initiator/helicase, large tumor antigen (LTag), in complex with its ori DNA. The structure shows that multidomain LTag assembles on ori DNA differently from what one would expect from previous studies. The structure also reveals an intrinsic DNA shape readout mechanism using histidines.|
RECOMB/ISCB Top-10 Paper in Regulatory and Systems Genomics in 2011
Slattery et al. Cofactor binding evokes latent differences in DNA binding specificity between Hox proteins
Cell 147, 1270-1282 (2011)
|In vivo transcription factor-DNA recognition is much more specific than in vitro binding. The eight Drosophila Hox proteins bind to very similar target sites but execute distinct in vivo functions. The figure illustrates that the cofactor Exd (yellow) unlocks a wide range in specificity of Hox proteins (cyan) for recognizing DNA target sites (metallic). Based on SELEX-seq experiments, we present specificity fingerprints of Hox proteins and reveal that DNA shape is a determining factor in achieving specificity. This is the first study, for which a preliminary version of our new approach for high-throughput DNA shape prediction has been applied to thousands of sequences, showing that anterior and posterior Hox proteins recognize different DNA shape. Moreover, DNA shape indicates how Hox genes have differentiated in evolution.|
Rohs et al. The role of DNA shape
in protein-DNA recognition
|The figure illustrates the molecular shape of nucleosomal DNA when wrapped around the histone core. The narrow minor groove is color-coded in dark grey. The red mesh shows an isopotential surface with negative electrostatic potential. The shape of narrow minor groove regions induces an enhanced negative electrostatic potential, which attracts histone arginines. Such interactions between the protein and DNA contribute to the stabilization of the nucleosome core particle.|
Rohs et al. Origin of specificity in protein-DNA recognition
Annu. Rev. Biochem. 79, 233-269 (2010)
In order to carry out their unique biological functions, proteins need to recognize their DNA binding sites in a highly specific manner. Specificity in protein-DNA binding is achieved through the recognition of both linear sequence and three-dimensional structure. Therefore, the nucleotide sequence of a binding site is only one part of the story, and the three-dimensional structures of both the DNA and the protein must be taken into account to fully understand recognition on a molecular basis. DNA shape is specifically recognized by a variety of protein families, and we have identified different ways of modulating DNA shape. The figure shows the shape of the molecular surface (top) of ideal A-DNA (left), B-DNA (center), and Z-DNA, and the resulting specific variations in electrostatic potential (bottom).