publications | Guojie Zhong

An up-to-date list is available on Google Scholar.

2024

PreMode predicts mode of action of missense variants by deep graph representation learning of protein sequence and structural context

Guojie Zhong, Yige Zhao, Demi Zhuang, Wendy K Chung, and Yufeng Shen

bioRxiv 2024

Abs PDF Code

Accurate prediction of the functional impact of missense variants is important for disease gene discovery, clinical genetic diagnostics, therapeutic strategies, and protein engineering. Previous efforts have focused on predicting a binary pathogenicity classification, but the functional impact of missense variants is multi-dimensional. Pathogenic missense variants in the same gene may act through different modes of action (i.e., gain/loss-of-function) by affecting different aspects of protein function. They may result in distinct clinical conditions that require different treatments. We developed a new method, PreMode, to perform gene-specific mode-of-action predictions. PreMode models effects of coding sequence variants using SE(3)-equivariant graph neural networks on protein sequences and structures. Using the largest-to-date set of missense variants with known modes of action, we showed that PreMode reached state-of-the-art performance in multiple types of mode-of-action predictions by efficient transfer-learning. Additionally, PreMode’s prediction of G/LoF variants in a kinase is validated with inactive-active conformation transition energy changes. Finally, we show that PreMode enables efficient study design of deep mutational scans and optimization in protein engineering.Competing Interest StatementThe authors have declared no competing interest.

2023

A probabilistic graphical model for estimating selection coefficient of missense variants from human population sequence data

Yige Zhao, Guojie Zhong, Jake Hagen, Hongbing Pan, Wendy K. Chung, and 1 more author

medRxiv 2023

Abs Bib PDF Code

Accurately predicting the effect of missense variants is a central problem in interpretation of genomic variation. Commonly used computational methods does not capture the quantitative impact on fitness in populations. We developed MisFit to estimate missense fitness effect using biobank-scale human population genome data. MisFit jointly models the effect at molecular level (d) and population level (selection coefficient, s), assuming that in the same gene, missense variants with similar d have similar s. MisFit is a probabilistic graphical model that integrates deep neural network components and population genetics models efficiently with inductive bias based on biological causality of variant effect. We trained it by maximizing probability of observed allele counts in 236,017 European individuals. We show that s is informative in predicting frequency across ancestries and consistent with the fraction of de novo mutations given s. Finally, MisFit outperforms previous methods in prioritizing missense variants in individuals with neurodevelopmental disorders.Competing Interest StatementThe authors have declared no competing interest.Funding StatementThis work is supported by NIH grants (R35GM149527, R01GM120609, and P50HD109879), Simons Foundation (SFARI #1019623), and Columbia Precision Medicine Pilot grants program.Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:IRB of Columbia University gave ethical approval for this workI confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.YesAll data produced in the present work are contained in the manuscript https://github.com/ShenLab/MisFit
@article{MisFit, author = {Zhao, Yige and Zhong, Guojie and Hagen, Jake and Pan, Hongbing and Chung, Wendy K. and Shen, Yufeng}, title = {A probabilistic graphical model for estimating selection coefficient of missense variants from human population sequence data}, pages = {2023.12.11.23299809}, year = {2023}, doi = {10.1101/2023.12.11.23299809}, publisher = {Cold Spring Harbor Laboratory Press}, url = {https://www.medrxiv.org/content/early/2023/12/22/2023.12.11.23299809}, journal = {medRxiv}, }
VBASS enables integration of single cell gene expression data in Bayesian association analysis of rare variants

G. Zhong, Y. A. Choi, and Y. Shen

2023

Abs Bib PDF Code

Rare or de novo variants have substantial contribution to human diseases, but the statistical power to identify risk genes by rare variants is generally low due to rarity of genotype data. Previous studies have shown that risk genes usually have high expression in relevant cell types, although for many conditions the identity of these cell types are largely unknown. Recent efforts in single cell atlas in human and model organisms produced large amount of gene expression data. Here we present VBASS, a Bayesian method that integrates single-cell expression and de novo variant (DNV) data to improve power of disease risk gene discovery. VBASS models disease risk prior as a function of expression profiles, approximated by deep neural networks. It learns the weights of neural networks and parameters of Gamma-Poisson likelihood models of DNV counts jointly from expression and genetics data. On simulated data, VBASS shows proper error rate control and better power than state-of-the-art methods. We applied VBASS to published datasets and identified more candidate risk genes with supports from literature or data from independent cohorts. VBASS can be generalized to integrate other types of functional genomics data in statistical genetics analysis.
@article{VBASS, author = {Zhong, G. and Choi, Y. A. and Shen, Y.}, title = {VBASS enables integration of single cell gene expression data in Bayesian association analysis of rare variants}, journal = {Commun Biol}, volume = {6}, number = {1}, pages = {774}, note = {Commun Biol. 2023 Jul 25;6(1):774. doi: 10.1038/s42003-023-05155-9.}, issn = {2399-3642 (Electronic) 2399-3642 (Linking)}, doi = {10.1038/s42003-023-05155-9}, url = {https://www.ncbi.nlm.nih.gov/pubmed/37491581}, year = {2023}, type = {Journal Article}, }

2022

MLSB 2022
Representation of missense variants for predicting modes of action

G. Zhong, and Y. Shen

Machine Learning in Structural Biology, Workshop at the 36th Conference on Neural Information Processing Systems (NeurIPS), 2022

Abs Bib PDF Code

Accurate prediction of functional impact for missense variants is fundamental for genetic analysis and clinical applications. Current methods focused on generating an overall pathogenicity prediction score while overlooking the fact that variant effect should be multi-dimensional via different modes of action, such as gain or loss of function, and loss of folding stability or enzymatic activity. Recent breakthrough of high-capacity language models enabled ab initio prediction of protein structures as well as self-supervised representation learning of protein sequence and functions. Here we present RESCVE, a method to learn universal representation of sequence variation from protein context. We demonstrated the utility of the method predicting a range of modes of action for missense variants through transfer learning.
@article{RN10, author = {Zhong, G. and Shen, Y.}, title = {Representation of missense variants for predicting modes of action}, booktitle = {Machine Learning in Structural Biology, Workshop at the 36th Conference on Neural Information Processing Systems}, %type = {Conference Proceedings}, journal = {Machine Learning in Structural Biology, Workshop at the 36th Conference on Neural Information Processing Systems (NeurIPS),}, year = {2022}, }
Statistical models of the genetic etiology of congenital heart disease

G. Zhong, and Y. Shen

Curr Opin Genet Dev, 2022

Abs Bib PDF

Congenital heart disease (CHD) is a collection of anatomically and clinically heterogeneous structure anomalies of heart at birth. Finding genetic causes of CHD can not only shed light on developmental biology of heart, but also provide basis for improving clinical care and interventions. The optimal study design and analytical approaches to identify genetic causes depend on the underlying genetic architecture. A few well-known syndromes with CHD as core conditions, such as Noonan and CHARGE, have known monogenic causes. The genetic causes of most of CHD patients, however, are unknown and likely to be complex. In this review, we highlight recent studies that assume a complex genetic architecture of CHD with two main approaches. One is genomic sequencing studies aiming for identifying rare or de novo risk variants with large genetic effect. The other is genome-wide association studies optimized for common variants with moderate genetic effect.
@article{RN9, author = {Zhong, G. and Shen, Y.}, title = {Statistical models of the genetic etiology of congenital heart disease}, journal = {Curr Opin Genet Dev,}, volume = {76}, pages = {101967}, issn = {1879-0380 (Electronic) 0959-437X (Linking)}, doi = {10.1016/j.gde.2022.101967}, url = {https://www.ncbi.nlm.nih.gov/pubmed/35939966}, year = {2022}, %type = {Journal Article}, }
Identification and validation of candidate risk genes in endocytic vesicular trafficking associated with esophageal atresia and tracheoesophageal fistulas

G. Zhong*, P. Ahimaz*, N. A. Edwards*, J. J. Hagen, C. Faure, and 13 more authors

HGG Adv, 2022

Abs Bib PDF Code

Esophageal atresias/tracheoesophageal fistulas (EA/TEF) are rare congenital anomalies caused by aberrant development of the foregut. Previous studies indicate that rare or de novo genetic variants significantly contribute to EA/TEF risk, and most individuals with EA/TEF do not have pathogenic genetic variants in established risk genes. To identify novel genetic contributions to EA/TEF, we performed whole genome sequencing of 185 trios (probands and parents) with EA/TEF, including 59 isolated and 126 complex cases with additional congenital anomalies and/or neurodevelopmental disorders. There was a significant burden of protein altering de novo coding variants in complex cases (p=3.3e-4), especially in genes that are intolerant of loss of function variants in the population. We performed simulation analysis of pathway enrichment based on background mutation rate and identified a number of pathways related to endocytosis and intracellular trafficking that as a group have a significant burden of protein altering de novo variants. We assessed 18 variants for disease causality using CRISPR-Cas9 mutagenesis in Xenopus and confirmed 13 with tracheoesophageal phenotypes. Our results implicate disruption of endosome-mediated epithelial remodeling as a potential mechanism of foregut developmental defects. This research may have implications for the mechanisms of other rare congenital anomalies.
@article{RN7, author = {Zhong*, G. and Ahimaz*, P. and Edwards*, N. A. and Hagen, J. J. and Faure, C. and Lu, Q. and Kingma, P. and Middlesworth, W. and Khlevner, J. and El Fiky, M. and Schindel, D. and Fialkowski, E. and Kashyap, A. and Forlenza, S. and Kenny, A. P. and Zorn, A. M. and Shen, Y. and Chung, W. K.}, title = {Identification and validation of candidate risk genes in endocytic vesicular trafficking associated with esophageal atresia and tracheoesophageal fistulas}, journal = {HGG Adv,}, volume = {3}, number = {3}, pages = {100107}, issn = {2666-2477 (Electronic) 2666-2477 (Linking)}, doi = {10.1016/j.xhgg.2022.100107}, url = {https://www.ncbi.nlm.nih.gov/pubmed/35519826}, year = {2022}, %type = {Journal Article}, }
Discovering the Developmental Basis of Trachea-Esophageal Birth Defects: Evidence for Endosome-opathies

N. Edwards, G. Zhong, P. Ahimaz, A. Kenny, P. Kingma, and 4 more authors

The FASEB Journal, 2022

Abs Bib

The trachea and esophagus (TE) arise from a common foregut tube during embryonic development. Disruptions in TE morphogenesis cause congenital trachea-esophageal defects (TEDs) such as esophageal atresia, tracheoesophageal fistula and tracheoesophageal clefts. TEDs occur in approximately 1 in 3500 births, but their etiology is poorly understood. We have established the www.CLEARconsortium.org; a multidisciplinary team of clinicians, geneticists, bioinformaticians, stem cell and developmental biologists using patient genome sequencing, animal models and iPSC-derived human organoids to discover the genetic and developmental basis of trachea-esophageal birth defects. Using the complementary advantages of Xenopus and mouse models we have defined the conserved molecular and cellular mechanisms that regulate normal TE morphogenesis. We show that downstream of Hedgehog/Gli signaling endosome-mediated epithelial remodeling regulates TE morphogenesis which when disrupted results in tracheoesophageal clefts similar to human Pallister Hall syndrome patients. Proband-parent trio genome sequencing identified an enrichment of potential damaging de novo variants in genes encoding membrane/vesicular-trafficking proteins, suggesting a common “endosome-opathy” pathway. Ongoing CRISPR mutagenesis screens in Xenopus tropicalis assessing candidate causative variants from patients confirms that the endosome protein Itsn1 is essential for TE morphogenesis, suggesting that the ITSN1 variant is likely pathogenic in the patient. Finally, leveraging results from animal models we have generated multi-lineage human esophageal organoids from iPSCs with patient mutations to identify how mutations impact human esophageal differentiation. Together these results significantly advance our understanding of TEDs with the goal of revealing phenotype-genotype associations that will inform prognosis and clinical treatment.
@article{RN6, author = {Edwards, N. and Zhong, G. and Ahimaz, P. and Kenny, A. and Kingma, P. and Wells, J. and Shen, Y. and Chung, W. K. and Zorn, A.}, title = {Discovering the Developmental Basis of Trachea-Esophageal Birth Defects: Evidence for Endosome-opathies}, journal = {The FASEB Journal,}, volume = {36}, number = {S1}, issn = {0892-6638}, doi = {https://doi.org/10.1096/fasebj.2022.36.S1.0R569}, url = {https://faseb.onlinelibrary.wiley.com/doi/abs/10.1096/fasebj.2022.36.S1.0R569}, year = {2022}, %type = {Journal Article}, }

2021

Towards better understanding of developmental disorders from integration of spatial single-cell transcriptomics and epigenomics

G. Zhong*, J. Wang*, S. He*, and X. Fu*

The 2021 ICML Workshop on Computational Biology, 2021

Abs Bib PDF Code

The recent emerging techniques of single cell spatial RNA seq makes it possible to profile the transcriptomics data at single cell resolution without loss of the spatial information. However, it is still a challenge to measure epigenomics profiles at spatial levels. In this project, we developed an autoencoder based multi-omics integration method and applied it on spatial mouse fetal brain data to reconstruct the spa- tial epigenomics profiles. We compared our method with LIGER and showed its better performance on a public dataset measured by latent mixing metrics. We further developed a CNN model to predict autism risk genes based on the spatial RNA seq data. Our model is able to prioritize autism risk genes from whole genome level. Code of our project can be found at https://github.com/explorerwjy/ML_genomics
@article{RN5, author = {Zhong*, G. and Wang*, J. and He*, S. and Fu*, X.}, title = {Towards better understanding of developmental disorders from integration of spatial single-cell transcriptomics and epigenomics}, booktitle = {The 2021 ICML Workshop on Computational Biology}, journal = {The 2021 ICML Workshop on Computational Biology,}, %type = {Conference Proceedings}, year = {2021}, }
mRNA Delivery of a Bispecific Single-Domain Antibody to Polarize Tumor-Associated Macrophages and Synergize Immunotherapy against Liver Malignancies

Y. Wang, K. Tiruthani, S. Li, M. Hu, G. Zhong, and 6 more authors

Adv Mater, 2021

Abs Bib

Liver malignancies are among the tumor types that are resistant to immune checkpoint inhibition therapy. Tumor-associated macrophages (TAMs) are highly enriched and play a major role in inducing immunosuppression in liver malignancies. Herein, CCL2 and CCL5 are screened as two major chemokines responsible for attracting TAM infiltration and inducing their polarization toward cancer-promoting M2-phenotype. To reverse this immunosuppressive process, an innovative single-domain antibody that bispecifically binds and neutralizes CCL2 and CCL5 (BisCCL2/5i) with high potency and specificity is directly evolved. mRNA encoding BisCCL2/5i is encapsulated in a clinically approved lipid nanoparticle platform, resulting in a liver-homing biomaterial that allows transient yet efficient expression of BisCCL2/5i in the diseased organ in a multiple dosage manner. This BisCCL2/5i mRNA nanoplatform significantly induces the polarization of TAMs toward the antitumoral M1 phenotype and reduces immunosuppression in the tumor microenvironment. The combination of BisCCL2/5i with PD-1 ligand inhibitor (PD-Li) achieves long-term survival in mouse models of primary liver cancer and liver metastasis of colorectal and pancreatic cancers. The work provides an effective bispecific targeting strategy that could broaden the PD-Li therapy to multiple types of malignancies in the human liver.
@article{RN4, author = {Wang, Y. and Tiruthani, K. and Li, S. and Hu, M. and Zhong, G. and Tang, Y. and Roy, S. and Zhang, L. and Tan, J. and Liao, C. and Liu, R.}, title = {mRNA Delivery of a Bispecific Single-Domain Antibody to Polarize Tumor-Associated Macrophages and Synergize Immunotherapy against Liver Malignancies}, journal = {Adv Mater,}, volume = {33}, number = {23}, pages = {e2007603}, issn = {1521-4095 (Electronic) 0935-9648 (Linking)}, doi = {10.1002/adma.202007603}, url = {https://www.ncbi.nlm.nih.gov/pubmed/33945178}, year = {2021}, %type = {Journal Article}, }

Author Corrections: Reconstruction of cell spatial organization from single-cell RNA sequencing data based on ligand-receptor mediated self-assembly

X. Ren*, G. Zhong*, Q. Zhang, L. Zhang, Y. Sun, and 1 more author

Cell Res, 2021

Bib

@article{RN2,
  author = {Ren*, X. and Zhong*, G. and Zhang, Q. and Zhang, L. and Sun, Y. and Zhang, Z.},
  title = {Author Corrections: Reconstruction of cell spatial organization from single-cell RNA sequencing data based on ligand-receptor mediated self-assembly},
  journal = {Cell Res,},
  volume = {31},
  number = {12},
  pages = {1319-1320},
  issn = {1748-7838 (Electronic)
  1001-0602 (Linking)},
  doi = {10.1038/s41422-021-00550-5},
  url = {https://www.ncbi.nlm.nih.gov/pubmed/34381185},
  year = {2021},
  %type = {Journal Article},
}

2020

Reconstruction of cell spatial organization from single-cell RNA sequencing data based on ligand-receptor mediated self-assembly

X. Ren*, G. Zhong*, Q. Zhang, L. Zhang, Y. Sun, and 1 more author

Cell Res, 2020

Abs Bib PDF Code

Single-cell RNA sequencing (scRNA-seq) has revolutionized transcriptomic studies by providing unprecedented cellular and molecular throughputs, but spatial information of individual cells is lost during tissue dissociation. While imaging-based technologies such as in situ sequencing show great promise, technical difficulties currently limit their wide usage. Here we hypothesize that cellular spatial organization is inherently encoded by cell identity and can be reconstructed, at least in part, by ligand-receptor interactions, and we present CSOmap, a computational tool to infer cellular interaction de novo from scRNA-seq. We show that CSOmap can successfully recapitulate the spatial organization of multiple organs of human and mouse including tumor microenvironments for multiple cancers in pseudo-space, and reveal molecular determinants of cellular interactions. Further, CSOmap readily simulates perturbation of genes or cell types to gain novel biological insights, especially into how immune cells interact in the tumor microenvironment. CSOmap can be a widely applicable tool to interrogate cellular organizations based on scRNA-seq data for various tissues in diverse systems.
@article{RN3, author = {Ren*, X. and Zhong*, G. and Zhang, Q. and Zhang, L. and Sun, Y. and Zhang, Z.}, title = {Reconstruction of cell spatial organization from single-cell RNA sequencing data based on ligand-receptor mediated self-assembly}, journal = {Cell Res,}, volume = {30}, number = {9}, pages = {763-778}, issn = {1748-7838 (Electronic) 1001-0602 (Linking)}, doi = {10.1038/s41422-020-0353-2}, url = {https://www.ncbi.nlm.nih.gov/pubmed/32541867}, year = {2020}, %type = {Journal Article}, }

2019

Landscape and Dynamics of Single Immune Cells in Hepatocellular Carcinoma

Q. Zhang, Y. He, N. Luo, S. J. Patel, Y. Han, and 20 more authors

Cell, 2019

Abs Bib

The immune microenvironment of hepatocellular carcinoma (HCC) is poorly characterized. Combining two single-cell RNA sequencing technologies, we produced transcriptomes of CD45+ immune cells for HCC patients from five immune-relevant sites: tumor, adjacent liver, hepatic lymph node (LN), blood, and ascites. A cluster of LAMP3+ dendritic cells (DCs) appeared to be the mature form of conventional DCs and possessed the potential to migrate from tumors to LNs. LAMP3+ DCs also expressed diverse immune-relevant ligands and exhibited potential to regulate multiple subtypes of lymphocytes. Of the macrophages in tumors that exhibited distinct transcriptional states, tumor-associated macrophages (TAMs) were associated with poor prognosis, and we established the inflammatory role of SLC40A1 and GPNMB in these cells. Further, myeloid and lymphoid cells in ascites were predominantly linked to tumor and blood origins, respectively. The dynamic properties of diverse CD45+ cell types revealed by this study add new dimensions to the immune landscape of HCC.
@article{RN1, author = {Zhang, Q. and He, Y. and Luo, N. and Patel, S. J. and Han, Y. and Gao, R. and Modak, M. and Carotta, S. and Haslinger, C. and Kind, D. and Peet, G. W. and Zhong, G. and Lu, S. and Zhu, W. and Mao, Y. and Xiao, M. and Bergmann, M. and Hu, X. and Kerkar, S. P. and Vogt, A. B. and Pflanz, S. and Liu, K. and Peng, J. and Ren, X. and Zhang, Z.}, title = {Landscape and Dynamics of Single Immune Cells in Hepatocellular Carcinoma}, journal = {Cell,}, volume = {179}, number = {4}, pages = {829-845 e20}, issn = {1097-4172 (Electronic) 0092-8674 (Linking)}, doi = {10.1016/j.cell.2019.10.003}, url = {https://www.ncbi.nlm.nih.gov/pubmed/31675496}, year = {2019}, %type = {Journal Article}, }