Computational classifiers classify missense substitutions as pathogenetic or neutral based on inferences from evolutionary conservation using protein multiple sequence alignments (PMSAs) of the gene of interest (and other features). In this review, Tavtigian et al. make suggestions with respect to the important aspects of creating PMSAs that are informative for classification (and more).
In using evolutionary conservation to predict for the pathogenicity of missense substitutions, it is generally believed that missense substitutions falling at positions in the gene that are evolutionarily constrained are often pathogenic, whereas those falling at positions that are not constrained are often neutral or have minimal impact. However, to use this logic appropriately, Tavtigian et al. suggest that one must be able to answer three questions. (1) Is a particular PMSA is reasonably informative, i.e., has it sampled enough sequences at sufficient alignment depth to contribute to missense substitution analysis with reasonable sensitivity and specificity? (2) How does one use a PMSA to distinguish between positions that are functionally constrained or not? (3) Do different substitutions have different effects, and can we distinguish them based on variation observed in a PMSA? The appropriateness of PMSAs must be addressed by interpreters of in silico algorithms.
Tavtigian S, Greenblatt M, Lesueur F, Byrnes G, & IARC Unclassified Genetic Variants Working Group. (2008). In silico analysis of missense substitutions using sequence-alignment based methods. Hum Mutat , 29, 1327–1336.