Unraveling the Molecular Impact of Missense Variants: Insights into Protein Structure and Disease Associations

Student thesis: Master's Thesis


One of the primary challenges in clinical genetics is the interpretation of the numerous genetic variants identified through sequencing applications. Assessing the impact of missense variants where only one amino acid is substituted is particularly difficult. In this study, we examined the structural characteristics of amino acids affected by missense substitutions in 26,690 pathogenic variants and compared them to 11,302 common variants found in the general population. This analysis was conducted across 6,747 protein structures. The residues were annotated using 7 protein features with a total of 35 feature subtypes. Subsequently, we assessed the burden of both common and pathogenic missense variants across these features. Additionally, we carried out separate analyses relative to protein function (with variants grouped in 24 protein functional classes) and relative to diseases (with variants grouped in 86 diseases). Through a comprehensive analysis of the entire dataset, we identified 25 pathogenic features that play a crucial role in the overall fitness and stability of proteins. Additionally, when we conducted individual analyses for 24 protein functional classes, we discovered specific features that are relevant to each function. For the disease analysis we identified 3 main clusters. Type I diseases primarily result from ordered mutations and are mainly affected by charge loss. This cluster is dominated by transporter protein class and includes diseases linked to X-chromosome. Type II diseases involve hydrolases and are characterized by enriched variants at the protein core, resulting in protein destabilization. Type III diseases involve extracellular matrix proteins (mainly collagen), are predominantly found in disordered regions, and are affected by charge gain and introduction of polar residues. Gly variants are particularly relevant in this cluster, as collagen proteins require Gly in every third residue in the collagen triple-helix. Considering the structural aspects when interpreting mutations associated with diseases offers valuable insights into their underlying mechanisms. Our work can serve as resource to delineate and understand variant pathogenicity by mapping a genetic variant into its structural context.
Date of AwardJul 2023
Original languageEnglish (US)
Awarding Institution
  • Biological, Environmental Sciences and Engineering
SupervisorStefan Arold (Supervisor)


  • missense variants
  • protein structure
  • protein class

Cite this