Publications

You can also find my articles on my Google Scholar profile.

Training deep learning models on personalized genomic sequences improves variant effect prediction

Published in BioRxiv, 2024

Sequence-to-function models have broad applications in interpreting the molecular impact of genetic variation, yet have been criticized for poor performance in this task. Here we show that training models on functional genomic data with matched personal genomes improves their performance at variant effect prediction. Variant effect representations are retained even when fine tuning models to unseen cellular contexts and experimental readouts. Our results have implications for interpreting trait-associated genetic variation.

Recommended citation: A. Y. He, N. P. Palamuttam, and C. G. Danko. Training deep learning models on personalized genomic sequences improves variant effect prediction. BioRxiv. doi.org/10.1101/2024.10.15.618510
Download Paper

Dissection of core promoter syntax through single nucleotide resolution modeling of transcription initiation

Published in BioRxiv, 2024

Our understanding of how the DNA sequences of cis-regulatory elements encode transcription initiation patterns remains limited. Here we introduce CLIPNET, a deep learning model trained on population-scale PRO-cap data that accurately predicts the position and quantity of transcription initiation with single nucleotide resolution from DNA sequence. Interpretation of CLIPNET revealed a complex regulatory syntax consisting of DNA-protein interactions in five major positions between −200 and +50 bp relative to the transcription start site, as well as more subtle positional preferences among different transcriptional activators. Transcriptional activator and core promoter motifs occupy different positions and play distinct roles in regulating initiation, with the former driving initiation quantity and the latter initiation position. We identified core promoter motifs that explain initiation patterns in the majority of promoters and enhancers, including DPR motifs and AT-rich TBP binding sequences in TATA-less promoters. Our results provide insights into the sequence architecture governing transcription initiation.

Recommended citation: A. Y. He and C. G. Danko. Dissection of core promoter syntax through single nucleotide resolution modeling of transcription initiation. BioRxiv. doi:10.1101/2024.03.13.583868
Download Paper

Topological descriptions of protein folding

Published in Proceedings of the National Academy of Sciences of the United States of America, 2019

How knotted proteins fold has remained controversial since the identification of deeply knotted proteins nearly two decades ago. Both computational and experimental approaches have been used to investigate protein knot formation. Motivated by the computer simulations of Bölinger et al. [Bölinger D, et al. (2010) PLoS Comput Biol 6:e1000731] for the folding of the 61-knotted α-haloacid dehalogenase (DehI) protein, we introduce a topological description of knot folding that could describe pathways for the formation of all currently known protein knot types and predicts knot types that might be identified in the future. We analyze fingerprint data from crystal structures of protein knots as evidence that particular protein knots may fold according to specific pathways from our theory. Our results confirm Taylor’s twisted hairpin theory of knot folding for the 31-knotted proteins and the 41-knotted ketol-acid reductoisomerases and present alternative folding mechanisms for the 41-knotted phytochromes and the 52- and 61-knotted proteins.

Recommended citation: E. Flapan, A. Y. He, and H. Wong. Topological descriptions of protein folding. Proc. Natl. Acad. Sci. U.S.A. doi:10.1073/pnas.1808312116.
Download Paper

Bcl-2 overexpression improves survival and efficacy of neural stem cell-mediated enzyme prodrug therapy

Published in Stem Cells International, 2018

Tumor-tropic neural stem cells (NSCs) can be engineered to localize gene therapies to invasive brain tumors. However, like other stem cell-based therapies, survival of therapeutic NSCs after transplantation is currently suboptimal. One approach to prolonging cell survival is to transiently overexpress an antiapoptotic protein within the cells prior to transplantation. Here, we investigate the utility and safety of this approach using a clinically tested, v-myc immortalized, human NSC line engineered to contain the suicide gene, cytosine deaminase (CD-NSCs). We demonstrate that both adenoviral- and minicircle-driven expression of the antiapoptotic protein Bcl-2 can partially rescue CD-NSCs from transplant-associated insults. We further demonstrate that the improved CD-NSC survival afforded by transient Bcl-2 overexpression results in decreased tumor burden in an orthotopic xenograft glioma mouse model following administrations of intracerebral CD-NSCs and systemic prodrug. Importantly, no evidence of CD-NSC transformation was observed upon transient overexpression of Bcl-2. This research highlights a critical need to develop clinically relevant strategies to improve survival of therapeutic stem cell posttransplantation. We demonstrate for the first time in this disease setting that improving CD-NSC survival using Bcl-2 overexpression can significantly improve therapeutic outcomes.

Recommended citation: R. Mooney, A. A. Majid, D. Mota, S. Aramburo, A. Y. He, J. Covello-Batalla, D. Machado, J. Gonzaga, L. Flores, and K. S. Aboody. Bcl-2 overexpression improves survival and efficacy of neural stem cell-mediated enzyme prodrug therapy. Stem Cells Int. doi:10.1155/2018/7047496.
Download Paper