Using single cell atlas data to reconstruct regulatory networksSong, Qi; Ruffalo, Matthew; Bar-Joseph, Ziv
doi: 10.1093/nar/gkad053pmid: 36762475
Inference of global gene regulatory networks from omics data is a long-term goal of systems biology. Most methods developed for inferring transcription factor (TF)–gene interactions either relied on a small dataset or used snapshot data which is not suitable for inferring a process that is inherently temporal. Here, we developed a new computational method that combines neural networks and multi-task learning to predict RNA velocity rather than gene expression values. This allows our method to overcome many of the problems faced by prior methods leading to more accurate and more comprehensive set of identified regulatory interactions. Application of our method to atlas scale single cell data from 6 HuBMAP tissues led to several validated and novel predictions and greatly improved on prior methods proposed for this task.
Fitness functions for RNA structure designWard, Max; Courtney, Eliot; Rivas, Elena
doi: 10.1093/nar/gkad097pmid: 36869673
An RNA design algorithm takes a target RNA structure and finds a sequence that folds into that structure. This is fundamentally important for engineering therapeutics using RNA. Computational RNA design algorithms are guided by fitness functions, but not much research has been done on the merits of these functions. We survey current RNA design approaches with a particular focus on the fitness functions used. We experimentally compare the most widely used fitness functions in RNA design algorithms on both synthetic and natural sequences. It has been almost 20 years since the last comparison was published, and we find similar results with a major new result: maximizing probability outperforms minimizing ensemble defect. The probability is the likelihood of a structure at equilibrium and the ensemble defect is the weighted average number of incorrect positions in the ensemble. We find that maximizing probability leads to better results on synthetic RNA design puzzles and agrees more often than other fitness functions with natural sequences and structures, which were designed by evolution. Also, we observe that many recently published approaches minimize structure distance to the minimum free energy prediction, which we find to be a poor fitness function.
SMAP design: a multiplex PCR amplicon and gRNA design tool to screen for natural and CRISPR-induced genetic variationDeveltere, Ward; Waegneer, Evelien; Debray, Kevin; De Saeger, Jonas; Van Glabeke, Sabine; Maere, Steven; Ruttink, Tom; Jacobs, Thomas B
doi: 10.1093/nar/gkad036pmid: 36718951
Multiplex amplicon sequencing is a versatile method to identify genetic variation in natural or mutagenized populations through eco-tilling or multiplex CRISPR screens. Such genotyping screens require reliable and specific primer designs, combined with simultaneous gRNA design for CRISPR screens. Unfortunately, current tools are unable to combine multiplex gRNA and primer design in a high-throughput and easy-to-use manner with high design flexibility. Here, we report the development of a bioinformatics tool called SMAP design to overcome these limitations. We tested SMAP design on several plant and non-plant genomes and obtained designs for more than 80–90% of the target genes, depending on the genome and gene family. We validated the designs with Illumina multiplex amplicon sequencing and Sanger sequencing in Arabidopsis, soybean, and maize. We also used SMAP design to perform eco-tilling by tilling PCR amplicons across nine candidate genes putatively associated with haploid induction in Cichorium intybus. We screened 60 accessions of chicory and witloof and identified thirteen knockout haplotypes and their carriers. SMAP design is an easy-to-use command-line tool that generates highly specific gRNA and/or primer designs for any number of loci for CRISPR or natural variation screens and is compatible with other SMAP modules for seamless downstream analysis.
Development of a selection assay for small guide RNAs that drive efficient site-directed RNA editingDiaz Quiroz, Juan Felipe; Ojha, Namrata; Shayhidin, Elnur E; De Silva, Dasuni; Dabney, Jesse; Lancaster, Amy; Coull, James; Milstein, Stuart; Fraley, Andrew W; Brown, Christopher R; Rosenthal, Joshua J C
doi: 10.1093/nar/gkad098pmid: 36840708
A major challenge confronting the clinical application of site-directed RNA editing (SDRE) is the design of small guide RNAs (gRNAs) that can drive efficient editing. Although many gRNA designs have effectively recruited endogenous Adenosine Deaminases that Act on RNA (ADARs), most of them exceed the size of currently FDA-approved antisense oligos. We developed an unbiased in vitro selection assay to identify short gRNAs that promote superior RNA editing of a premature termination codon. The selection assay relies on hairpin substrates in which the target sequence is linked to partially randomized gRNAs in the same molecule, so that gRNA sequences that promote editing can be identified by sequencing. These RNA substrates were incubated in vitro with ADAR2 and the edited products were selected using amplification refractory mutation system PCR and used to regenerate the substrates for a new round of selection. After nine repetitions, hairpins which drove superior editing were identified. When gRNAs of these hairpins were delivered in trans, eight of the top ten short gRNAs drove superior editing both in vitro and in cellula. These results show that efficient small gRNAs can be selected using our approach, an important advancement for the clinical application of SDRE.
Single-molecule analysis of DNA-binding proteins from nuclear extracts (SMADNE)Schaich, Matthew A; Schnable, Brittani L; Kumar, Namrata; Roginskaya, Vera; Jakielski, Rachel C; Urban, Roman; Zhong, Zhou; Kad, Neil M; Van Houten, Bennett
doi: 10.1093/nar/gkad095pmid: 36861323
Single-molecule characterization of protein–DNA dynamics provides unprecedented mechanistic details about numerous nuclear processes. Here, we describe a new method that rapidly generates single-molecule information with fluorescently tagged proteins isolated from nuclear extracts of human cells. We demonstrated the wide applicability of this novel technique on undamaged DNA and three forms of DNA damage using seven native DNA repair proteins and two structural variants, including: poly(ADP-ribose) polymerase (PARP1), heterodimeric ultraviolet-damaged DNA-binding protein (UV-DDB), and 8-oxoguanine glycosylase 1 (OGG1). We found that PARP1 binding to DNA nicks is altered by tension, and that UV-DDB did not act as an obligate heterodimer of DDB1 and DDB2 on UV-irradiated DNA. UV-DDB bound to UV photoproducts with an average lifetime of 39 seconds (corrected for photobleaching, τc), whereas binding lifetimes to 8-oxoG adducts were < 1 second. Catalytically inactive OGG1 variant K249Q bound oxidative damage 23-fold longer than WT OGG1, at 47 and 2.0 s, respectively. By measuring three fluorescent colors simultaneously, we also characterized the assembly and disassembly kinetics of UV-DDB and OGG1 complexes on DNA. Hence, the SMADNE technique represents a novel, scalable, and universal method to obtain single-molecule mechanistic insights into key protein–DNA interactions in an environment containing physiologically-relevant nuclear proteins.
SpliceTools, a suite of downstream RNA splicing analysis tools to investigate mechanisms and impact of alternative splicingFlemington, Erik K; Flemington, Samuel A; O’Grady, Tina M; Baddoo, Melody; Nguyen, Trang; Dong, Yan; Ungerleider, Nathan A
doi: 10.1093/nar/gkad111pmid: 36864749
As a fundamental aspect of normal cell signaling and disease states, there is great interest in determining alternative splicing (AS) changes in physiologic, pathologic, and pharmacologic settings. High throughput RNA sequencing and specialized software to detect AS has greatly enhanced our ability to determine transcriptome-wide splicing changes. Despite the richness of this data, deriving meaning from sometimes thousands of AS events is a substantial bottleneck for most investigators. We present SpliceTools, a suite of data processing modules that arms investigators with the ability to quickly produce summary statistics, mechanistic insights, and functional significance of AS changes through command line or through an online user interface. Utilizing RNA-seq datasets for 186 RNA binding protein knockdowns, nonsense mediated RNA decay inhibition, and pharmacologic splicing inhibition, we illustrate the utility of SpliceTools to distinguish splicing disruption from regulated transcript isoform changes, we show the broad transcriptome footprint of the pharmacologic splicing inhibitor, indisulam, we illustrate the utility in uncovering mechanistic underpinnings of splicing inhibition, we identify predicted neo-epitopes in pharmacologic splicing inhibition, and we show the impact of splicing alterations induced by indisulam on cell cycle progression. Together, SpliceTools puts rapid and easy downstream analysis at the fingertips of any investigator studying AS.
DeepBIO: an automated and interpretable deep-learning platform for high-throughput biological sequence prediction, functional annotation and visualization analysisWang, Ruheng; Jiang, Yi; Jin, Junru; Yin, Chenglin; Yu, Haoqing; Wang, Fengsheng; Feng, Jiuxin; Su, Ran; Nakai, Kenta; Zou, Quan; Wei, Leyi
doi: 10.1093/nar/gkad055pmid: 36796796
Here, we present DeepBIO, the first-of-its-kind automated and interpretable deep-learning platform for high-throughput biological sequence functional analysis. DeepBIO is a one-stop-shop web service that enables researchers to develop new deep-learning architectures to answer any biological question. Specifically, given any biological sequence data, DeepBIO supports a total of 42 state-of-the-art deep-learning algorithms for model training, comparison, optimization and evaluation in a fully automated pipeline. DeepBIO provides a comprehensive result visualization analysis for predictive models covering several aspects, such as model interpretability, feature analysis and functional sequential region discovery. Additionally, DeepBIO supports nine base-level functional annotation tasks using deep-learning architectures, with comprehensive interpretations and graphical visualizations to validate the reliability of annotated sites. Empowered by high-performance computers, DeepBIO allows ultra-fast prediction with up to million-scale sequence data in a few hours, demonstrating its usability in real application scenarios. Case study results show that DeepBIO provides an accurate, robust and interpretable prediction, demonstrating the power of deep learning in biological sequence functional analysis. Overall, we expect DeepBIO to ensure the reproducibility of deep-learning biological sequence analysis, lessen the programming and hardware burden for biologists and provide meaningful functional insights at both the sequence level and base level from biological sequences alone. DeepBIO is publicly available at https://inner.wei-group.net/DeepBIO.
Weak tension accelerates hybridization and dehybridization of short oligonucleotidesHart, Derek J; Jeong, Jiyoun; Gumbart, James C; Kim, Harold D
doi: 10.1093/nar/gkad118pmid: 36869666
The hybridization and dehybridization of DNA subject to tension is relevant to fundamental genetic processes and to the design of DNA-based mechanobiology assays. While strong tension accelerates DNA melting and decelerates DNA annealing, the effects of tension weaker than 5 pN are less clear. In this study, we developed a DNA bow assay, which uses the bending rigidity of double-stranded DNA (dsDNA) to exert weak tension on a single-stranded DNA (ssDNA) target in the range of 2–6 pN. Combining this assay with single-molecule FRET, we measured the hybridization and dehybridization kinetics between a 15 nt ssDNA under tension and a 8–9 nt oligonucleotide, and found that both the hybridization and dehybridization rates monotonically increase with tension for various nucleotide sequences tested. These findings suggest that the nucleated duplex in its transition state is more extended than the pure dsDNA or ssDNA counterpart. Based on coarse-grained oxDNA simulations, we propose that this increased extension of the transition state is due to steric repulsion between the unpaired ssDNA segments in close proximity to one another. Using linear force-extension relations verified by simulations of short DNA segments, we derived analytical equations for force-to-rate conversion that are in good agreement with our measurements.
Unusual enantioselective cytoplasm-to-nucleus translocation and photosensitization of the chiral Ru(II) cationic complex via simple ion-pairing with lipophilic weak acid counter-anionsChao, Xi-Juan; Huang, Chun-Hua; Tang, Miao; Yan, Zhu-Ying; Huang, Rong; Li, Yan; Zhu, Ben-Zhan
doi: 10.1093/nar/gkad155pmid: 36938880
Targeted and enantioselective delivery of chiral diagnostic-probes and therapeutics into specific compartments inside cells is of utmost importance in the improvement of disease detection and treatment. The classical DNA ‘light-switch’ ruthenium(II)-polypyridyl complex, [Ru(DIP)2(dppz)]Cl2 (DIP = 4,7-diphenyl-1,10-phenanthroline, dppz = dipyridophenazine) has been shown to be accumulated only in the cytoplasm and membrane, but excluded from its intended nuclear DNA target. In this study, the cationic [Ru(DIP)2(dppz)]2+ is found to be redirected into live-cell nucleus in the presence of lipophilic 3,5-dichlorophenolate or flufenamate counter-anions via ion-pairing mechanism, while maintaining its original DNA recognition characteristics. Interestingly and unexpectedly, further studies show that only the Δ-enantiomer is selectively translocated into nucleus while the Λ-enantiomer remains trapped in cytoplasm, which is found to be mainly due to their differential enantioselective binding affinities with cytoplasmic proteins and nuclear DNA. More importantly, only the nucleus-relocalized Δ-enantiomer can induce obvious DNA damage and cell apoptosis upon prolonged visible-light irradiation. Thus, the use of Δ-enantiomer can significantly reduce the dosage needed for maximal treatment effect. This represents the first report of enantioselective targeting and photosensitization of classical Ru(II) complex via simple ion-pairing with suitable weak acid counter-anions, which opens new opportunities for more effective enantioselective cancer treatment.
Tissue-specific regulation of gene expression via unproductive splicingMironov, Alexei; Petrova, Marina; Margasyuk, Sergey; Vlasenok, Maria; Mironov, Andrey A; Skvortsov, Dmitry; Pervouchine, Dmitri D
doi: 10.1093/nar/gkad161pmid: 36912101
Eukaryotic gene expression is regulated post-transcriptionally by a mechanism called unproductive splicing, in which mRNA is triggered to degrade by the nonsense-mediated decay (NMD) pathway as a result of regulated alternative splicing (AS). Only a few dozen unproductive splicing events (USEs) are currently documented, and many more remain to be identified. Here, we analyzed RNA-seq experiments from the Genotype-Tissue Expression (GTEx) Consortium to identify USEs, in which an increase in the NMD isoform splicing rate is accompanied by tissue-specific down-regulation of the host gene. To characterize RNA-binding proteins (RBPs) that regulate USEs, we superimposed these results with RBP footprinting data and experiments on the response of the transcriptome to the perturbation of expression of a large panel of RBPs. Concordant tissue-specific changes between the expression of RBP and USE splicing rate revealed a high-confidence regulatory network including 27 tissue-specific USEs with strong evidence of RBP binding. Among them, we found previously unknown PTBP1-controlled events in the DCLK2 and IQGAP1 genes, for which we confirmed the regulatory effect using small interfering RNA (siRNA) knockdown experiments in the A549 cell line. In sum, we present a transcriptomic pipeline that allows the identification of tissue-specific USEs, potentially many more than were reported here using stringent filters.