Determining structure and function of protein and RNA molecules is a cornerstone of modern biology and medicine. The Yang Zhang Lab develops novel bioinformatics algorithms, built on cutting-edge artificial intelligence (AI) technique and physics-based atomic force field, to accurately model and redesign the structure and function of proteins and RNAs. While one goal of the studies is to reveal the fundamental relationship between genome sequence and protein structure and function, the Zhang Lab is particularly interested in the extension of AI-based computational approaches to medicine design and drug discovery, with a focus on cancer therapeutics and treatment.



Senior Principal Investigator, Cancer Science Institute of Singapore
Professor of Computer Science, School of Computing, NUS
Professor of Biochemistry, Yong Loo Lin School of Medicine, NUS


The research of Prof. Yang Zhang Lab focuses on protein folding and structure prediction, protein design and engineering, and structure-based protein function annotation and drug discovery (Fig 1). Several methods developed in the Zhang Lab have been recognized as the world’s best. Among them, I-TASSER (as ‘Zhang-Server’ and ‘UM-TBM’) was ranked as the No 1 most accurate method for automated protein structure predictions in the last nine CASP experiments (CASP7-15 in 2006-2022), which are biennial community-wide blind contests designed to benchmark the state of the art of the field. Accordingly, the I-TASSER server (https://zhanggroup.org/I-TASSER/) becomes one of the most widely used platforms for protein structure and function predictions, serving more than 170,000 registered users from 160 countries.



Fig 1. One principal goal of the research in Zhang Lab is to reveal the fundamental relationship between protein sequence, structure, and function.


In addition to I-TASSER, many of other tools developed in the Zhang Lab are widely used by the globe community and have generated important impacts on the community. For instance, TM-score (https://en.wikipedia.org/wiki/Template_modeling_score) and TM-align (https://zhanggroup.org/TM-align) programs are used as standard tools for protein structure analyses and selected as the default structure comparison methods by various systems including the Protein Data Bank (PDB) and the Debian Unix/Linux System. The idea of template profile-based protein structure prediction was later extended to protein design, where a novel evolution-based method, EvoDesign, was proposed to successfully design new protein domains and peptides to regulate cancer cell apoptosis pathway and to block the association of SARS-CoV-2 virus with human host cells, respectively.


Most recently, the Zhang Lab aspires to develop new artificial intelligence (AI) and deep learning methods to improve the scope and accuracy of protein folding and protein design, which represent two fundamentally important and inverse procedures of biological process (Fig 2). The Zhang Lab is among the first a few laboratories who have initiated the studies on deep machine-learning based protein and RNA structure predictions, while the artificial intelligence (AI) technique has eventually brought about revolutionary impact on the field of structural biology and life science. The Zhang Lab is particularly interested in the application of the AI-based computational algorithms for effective cancer therapeutics and treatment, with focuses on anti-cancer peptide-drug conjugates (PDCs), T-cell receptor (TCR)-based adoptive therapy, and antibody discovery and optimization.


Figure 2. The integrations of AI and deep neural-network learning techniques significantly improve the power and accuracy of two inverse processes of protein folding and protein design.

Selected Publications

1. R Pearce, X Huang, GS Omenn, Y Zhang. De novo protein fold design through sequence-independent fragment assembly simulations. PNAS, 120: e2208275120 (2023). 

2. C Zhang, M Shine, AM Pyle, Y Zhang. US-align: Universal structure alignment of proteins, nucleic acids and macromolecular complexes. Nature Methods, 19: 1109-1115 (2022).

3. X Zhou, W Zheng, Y Li, R Pearce, C Zhang, EW Bell, G Zhang, Y Zhang. I-TASSER-MTD: A deep-learning based platform for multi-domain protein structure and function prediction. Nature Protocols, 17: 2326-2353 (2022).

4. X Zhou, Y Li, C Zhang, W Zheng, G Zhang, Y Zhang. Progressive assembly of multi-domain protein structures from cryo-EM density maps. Nature Computational Science, 2: 265-275 (2022).

5. X Zhang, B Zhang, PL Freddolino, Y Zhang. CR-I-TASSER: Assemble protein structures from cryo-EM density maps using deep convolutional neural networks. Nature Methods, 19: 195-204 (2022).

6. SM Mortuz, W Zheng, C Zhang, Y Li, R Pearce, Y Zhang. Improving fragment-based ab initio protein structure assembly using low-accuracy contact-map predictions. Nature Communications, 12: 5011 (2021).

7. P Yang, W Zheng, K Ning, Y Zhang. Decoding the link of microbiome niches with homologous sequences enables accurately targeted protein structure prediction. PNAS, 118: e2110828118 (2021).

8. W Zheng, C Zhang, Y Li, R Pearce, EW Bell, Y Zhang. Folding non-homology proteins by coupling deep-learning contact maps with I-TASSER assembly simulations. Cell Reports Methods, 1: 100014 (2021).

9. Y Wang, Q Shi, P Yang, C Zhang, SM Mortuza, Z Xue, K Ning, Y Zhang>. Fueling ab initio folding with marine metagenomics enables structure and function predictions of new protein families. Genome Biology, 20: 229 (2019).

10. X Zhou, J Hu, C Zhang, G Zhang, Y Zhang. Assembling multidomain protein structures through analogous global structural alignments. PNAS, 116: 15930-15938 (2019).