Protein Structure Prediction

Example of predicted transmembrane segment topologies for three proteins.

Active from 1997 to 1999

Research rationale

Membrane proteins are involved in a wide range of essential functions, including the communication between cells and the transport of nutrients, ions, and waste products across biological membranes. These proteins, that are estimated to constitute 25% of proteins at a genomic scale, play key roles in an equally wide range of diseases like diabete, hypertension, depression, arthritis and cancer. They are also common drug targets (for over 75% of pharmaceuticals in use today). Determining membrane protein structures is essential for the understanding of how drugs interfere with cellular communication and regulation. However, current knowledge about the detailed 3D structures of membrane proteins is limited, because such protein structures are difficult to study by traditional experimental methods.

The idea is to use computational techniques to enhance our knowledge about membrane proteins. However, developing algorithms that are capable of predicting the three-dimensional structure of proteins at atomic detail is a very difficult task. Instead of tertiary structure determination, we focused our research on two complementary aspects: the structural classification of proteins, which allows to identify potential membrane proteins, and the prediction of transmembrane alpha-helices in membrane proteins.

Results

Structural classification of proteins

Periodical patterns and tandem repeats of residues are often found in DNA and protein sequences. In proteins, their presence helps towards an understanding of the molecular structure of a fibrous/structural protein employing the principle of conformational equivalence and it may suggest ways of ultramolecular assembly for the formation of higher order structure. Characteristic examples are periodicities found in a number of sequences of fibrous proteins (e.g. tropomyosin, myosin, keratins and collagen). We used the Fourier analysis method to highlight hidden periodicities in protein sequences and developped FT, a tool accessible by biologists through the Internet(Pasquier et al. 1998a; b).

In the continuation of this work, we explored the use of hierarchical, artificial neural networks for the generalized classification of proteins into several distinct classes - transmembrane, fibrous, globular, and mixed - from information solely encoded in their amino acid sequences (Pasquier et al. 2001, 1999a; Promponas et al. 2001). The use of our implementations (PRED-TMR2 and PRED-CLASS) to analyze various test sets and complete proteomes of several organisms demonstrates that such methods could serve as a valuable tool in the annotation of genomic open reading frames with no functional assignment or as a preliminary step in fold recognition and ab initio structure prediction methods (Iliopoulos et al. 2003; Liakopoulos et al. 2001a, 2000; Pasquier and Hamodrakas 1999; Promponas et al. 1998, 1999).

Prediction of transmembrane alpha-helices in membrane proteins

The successful location of transmembrane segments, of their secondary structure and the packing modes of secondary structure elements is important because they define the architecture of a transmembrane protein. However, equally important is the determination of topology, which defines the polarity of integral membrane proteins.

Researchers have identified several characteristics that are common to a large proportion of transmembrane segments. They observed, for example, that transmembrane segments are mainly composed of hydrophobic residues, and that the propensity of positively charged residues is higher in the non-transmembrane segments on the inner part of the cell, also that a high propensity of tyrosine and tryptophan indicates the outer part of the cell. To enhance this knowledge, we performed several statistical analysis of known transmembrane segments to find other characteristics of transmembrane parts. We determined, among other things, the distribution of transmembrane segment length, the propensity for each amino acid to be in a transmembrane region and the precise profiles of potential termini ("edges", starts and ends) of transmembrane regions (Pasquier et al. 1999b). We combined this information with several scoring functions to predict the precise position of transmembrane segments and their topology (Liakopoulos et al. 1999, 2001b). The accuracy of our method compares well with that of other popular existing methods. This work led to the implementation of several tools freely available on the Internet: PRED-TMR, OrienTM, CoPreTHi and Dam-Bio.

Funding

Program Training and Mobility of Researchers (TMR)
Year 1997-1999
Funder European Economic Community (EEC)
Grant name EEC-TMR "GENEQUIZ", Integrated Software System for Molecular Biologists
Grant id ERBFMRXCT960019
Project coordinator Chris Sanders

Softwares

  • COPRETHI: Ensemble learning to predict transmembrane segments in proteins
  • DAM-BIO: Integrated environment designed to support protein sequence and structure analysis on the Web
  • DB-NTMR: Database of non transmembrane regions automatically extracted from the SwissProt database
  • DB-TMR: Database of transmembrane regions automatically extracted from the SwissProt database
  • FT: Analysis of periodic patterns in amino acid or DNA sequences by Fourrier transform
  • ORIENTM: Topology prediction of transmembrane proteins and segments
  • PRED-CLASS: System of cascading neural networks that classifies any protein into one of four possible classes: membrane, globular, fibrous, mixed
  • PRED-TMR: Prediction of transmembrane domains in proteins
  • PRED-TMR2: Identification of transmembrane proteins and prediction of their transmembranle domains

Iliopoulos, I., Tsoka, S., Andrade, M. A., Enright, A. J., Carroll, M., Poullet, P., Promponas, V., Liakopoulos, T., Palaios, G., Pasquier, C., Hamodrakas, S., Tamames, J., Yagnik, A. T., Tramontano, A., Devos, D., Blaschke, C., Valencia, A., Brett, D., Martin, D., Leroy, C., Rigoutsos, I., Sander, C., and Ouzounis, C. A. (2003), “Evaluation of annotation strategies using an entire genome sequence,” Bioinformatics (Oxford, England), Oxford Academic, 19, 717–26. https://doi.org/10.1093/bioinformatics/btg077.

Liakopoulos, T., Harkiolakis, N., Promponas, V., Pasquier, C., Hamodrakas, I., Papandreou, N., Iconomidou, V., Papandreou, N., Tzafestas, E., Tzafestas, S., Eliopoulos, E., and Hamodrakas, S. (2001a), “DAM-BIO: Bioinformatics internet workbench for protein analysis. New modules and applications to biological problems,” in 23rd conference of the hellenic society for biological sciences, Chios island.

Liakopoulos, T., Palaios, G., Promponas, V., Hamodrakas, I., Pasquier, C., and Hamodrakas, S. (2000), “A workbench for computational analysis of protein sequence and structure on the Internet,” in 22nd conference of the hellenic society for biological sciences, Skiathos island.

Liakopoulos, T., Pasquier, C., and Hamodrakas, S. (1999), “OrienTM: A novel method to predict transmembrane protein topology,” in 21st conference of the hellenic society for biological sciences, Galissas, Syros island.

Liakopoulos, T., Pasquier, C., and Hamodrakas, S. (2001b), “A novel tool for the prediction of transmembrane protein topology based on a statistical analysis of the SwissProt database: the OrienTM algorithm,” Protein Engineering Design and Selection, (Oxford, ed.), Oxford Academic, 14, 387–390. https://doi.org/10.1093/protein/14.6.387.

Pasquier, C., and Hamodrakas, S. (1999), “An hierarchical artificial neural network system for the classification of transmembrane proteins,” Protein Engineering Design and Selection, (Oxford, ed.), Oxford Academic, 12, 631–634. https://doi.org/10.1093/protein/12.8.631.

Pasquier, C., Promponas, V., and Hamodrakas, S. (2001), “PRED-CLASS: cascading neural networks for generalized protein classification and genome-wide applications.” Proteins: Structure, Function, and Bioinformatics, (Wiley, ed.), 44, 361–9. https://doi.org/10.1002/prot.1101.

Pasquier, C., Promponas, V., Palaios, G., Hamodrakas, I., and Hamodrakas, S. (1999a), “PRED-TMR2: An hierarchical neural network to classify proteins as transmembrane and a novel method to predict transmembrane segments,” in 21st conference of the hellenic society for biological sciences, Galissas, Syros island.

Pasquier, C., Promponas, V., Palaios, G., Hamodrakas, I., and Hamodrakas, S. (1999b), “A novel method for predicting transmembrane segments in proteins based on a statistical analysis of the SwissProt database: the PRED-TMR algorithm,” Protein Engineering Design and Selection, (Oxford, ed.), Oxfor, 12, 381–385. https://doi.org/10.1093/protein/12.5.381.

Pasquier, C., Promponas, V., Varvayannis, N., and Hamodrakas, S. (1998a), “A web interface for FT: a tool dedicated to the study of periodicities in sequences,” in 20th conference of the hellenic society for biological science, Samos Island.

Pasquier, C., Promponas, V., Varvayannis, N., and Hamodrakas, S. (1998b), “A web server to locate periodicities in a sequence,” Bioinformatics, (Oxford, ed.), Oxford Academic, 14, 749–50. https://doi.org/10.1093/oxfordjournals.bioinformatics.a011054.

Promponas, V., Palaios, G., Pasquier, C., Hamodrakas, I., and Hamodrakas, S. (1998), “CoPreTHi: a program to combine the results of transmembrane protein segment prediction methods,” in 20th conference of the hellenic society for biological sciences, Samos Island.

Promponas, V., Palaios, G., Pasquier, C., Hamodrakas, I., and Hamodrakas, S. (1999), “CoPreTHi: a Web tool which combines transmembrane protein segment prediction methods.” In silico biology, (IOS, ed.), article in refereed Journal, IOS Press, 1, 159–62.

Promponas, V., Pasquier, C., and Hamodrakas, S. (2001), “PRED-CLASS: Bioinformatics software for generalized protein classification and genome-wide applications,” in 23rd conference of the hellenic society for biological sciences, Chios island.

Avatar
Claude Pasquier
Researcher in Computer Science / Computational Biology

Université côte d’Azur, CNRS, I3S Laboratory, Sophia Antipolis

Related