N in a protein sequence meaning

9/28/2023

1A), and we assess its generalization to unseen native topologies and to a de novo TIM-barrel protein backbone. In this study, we explore an approach for sequence design guided only by a neural network that explicitly models side-chain conformers in a structure-based context (Fig. We hypothesized that a learned model could operate as a soft potential that implicitly captures backbone flexibility, producing diverse sequences for a fixed protein backbone. For most native proteins, however, the existence of many structural homologs with low sequence identity suggests that there is a distribution of viable sequences that can adopt a target fold, but the discovery of these sequences given a fixed-backbone reference structure is difficult. Furthermore, most energy functions are highly sensitive to specific atom placement, and as a result, designed sequences can be convergent for a given starting backbone conformation. Given the expressivity of deep neural networks, or their ability to approximate a rich class of functions, we predicted that a model conditioned on chemical context could learn higher-order (multi-body) interactions relevant for sequence design (e.g., hydrogen bonding networks). As such, we explored a method in which the neural network not only designs the sequence but explicitly builds rotamers and evaluates full-atom structural models, an approach not reported to date.Ĭonventional energy functions used in sequence design calculations are often composed of pairwise terms that model inter-atomic interactions. We sought to build a model that could generalize to unseen backbones with no homologous sequence data included in the training, as well as to validate that designs fold into target structures with designed side-chain conformations.

Structure-based machine learning methods for design thus far have focused on mutation prediction 25, 26, 27, 28, 29, 30, rotamer repacking of native sequences 31, or amino acid sequence design without modeling side-chain conformers 32, 33, 34, 35, with some experimental validation including circular dichroism data 33 and fluorescence 26.

We hypothesized that by training a model that conditions on local backbone structure and chemical environment, the network might learn residue-level patterns that allow it to generalize without fine-tuning to new backbones with topologies outside of the training distribution, opening up the possibility for generation of de novo designed sequences with novel structures and functions. Recent experimentally validated efforts for machine learning-based sequence generation have focused on sequence representation learning without structural information, requiring fitting to data from experiments or from known protein families to produce functional designs 23, 24. With the emergence of deep learning systems and their ability to learn patterns from high-dimensional data, it is now possible to build models that learn complex functions of protein sequence and structure, including models for protein backbone generation 18, 19, 20 and protein structure prediction 21, 22 as a result, we were curious as to whether an entirely learned method could be used to design protein sequences on par with energy function methods. Current approaches for fixed-backbone design commonly involve specifying an energy function and sampling sequence space to find a minimum-energy configuration 13, 14, 15, and enormous effort has gone into the development of carefully modeled and parameterized energy functions to guide design, which continue to be iteratively refined 16, 17. The functional design of enzymes, ligand binding sites, and interfaces all require fine-grained control over side-chain types and conformations.

This difficult task 12 is often described as the inverse of protein folding-given a protein backbone, design a sequence that folds into that conformation.

Key to such successes is robust sequence design methods that minimize the folded-state energy of a pre-specified backbone conformation, which can either be derived from existing structures or generated de novo. Computational protein design has emerged as a powerful tool for rational protein design, enabling significant achievements in the engineering of therapeutics 1, 2, 3, biosensors 4, 5, 6, enzymes 7, 8, and more 9, 10, 11.

0 Comments

N in a protein sequence meaning

Leave a Reply.

Author

Archives

Categories