EnsembleMHC prediction workflow. (A). The EnsembleMHC score algorithm was parameterized using MHC-I peptides observed in mass spectrometry datasets and 100 randomly generate length and protein matched decoy peptides.  The observed false detection rate (FDR) distribution at 50% recall for algorithms relative to each HLA. (B). The distribution of observed FDRs for each algorithm across all alleles. (C). The correlation between individual peptide scores for each algorithm across all alleles. (D). The EnsembleMHC workflow for the prediction of SARS-CoV-2 peptides (SI A1).

Prediction of SARS-CoV-2 peptides across 52 common MHC-I alleles. The EnsembleMHC workflow was used to predict 8-14mer MHC-I peptides for 52 alleles from the entire SARS-CoV-2 proteome (A) or specifically SARS-CoV-2 structural proteins (B). (C). Both distributions were individually standardized and the relative change in the binding capacity of each allele was calculated by taking the absolute difference of the Z-scores of allele binding capacity with respect to all SARS proteins or SARS structural proteins. Alleles showing a greater than 1 standard deviation increase or decrease change in binding capacity are highlighted in color.

Predicted total epitope load within a population inversely correlates with mortality.

(A). The correlation between EnsembleMHC population score with respect to all SARS-CoV-2 proteins (left panel) or structural proteins and deaths per million were calculated at each day starting from the day a country passed a particular death milestone ranging from 1 reported death to 100 reported death (line color). The days from each start point were normalized, and correlations that were shown to be statistically significant are colored with a red point. (B-C) The correlations for between the EnsembleMHC score based on structural proteins and death rate were shown for countries meeting the 50 confirmed death threshold. (B) The correlation between deaths per million rank (min rank = least number of deaths & max rank = most deaths) and EnsembleMHC population score rank (min rank = lowest score & max rank = highest score ) at days 1, 5, 10, and 15. Correlation coefficients and p values were assigned using spearman’s rank correlation.  (C). The countries at each time point were partitioned into a upper or lower half based on the observed EnsembleMHC population score. P values were determined by Mann-Whitney U test.

Protein origin of predicted SARS-CoV-2 peptides.

The localization of predicted MHC-I peptides derived from SARS-CoV-2 structural proteins was determined by mapping the peptides back to the reference sequence. (A). The frequency of each amino acid for each of the four SARS-CoV-2 structural proteins appearing in one of the 160 predicted peptides. (B) The number of polymorphisms appearing at each position in the structural sequences determined from the alignment of 104 reported SARS-CoV-2 sequences. (C). The predicted peptides were mapped onto the solved structures for the envelope (C) and spike (F) proteins, and the predicted structures for the nucleocapsid (D) and membrane (E) proteins. Red regions indicate an enrichment of predicted peptides and blue regions indicate a depletion of predicted peptides.

Abstract

Polymorphism in MHC-I protein sequences across human populations significantly impacts their binding to viral peptides and alters T cell immunity to infection. Prioritization of MHC-I restricted viral epitopes remains a fundamental challenge for understanding adaptive immunity to SARS-CoV-2. Here, we present a consensus MHC-I binding prediction model, EnsembleMHC, based on the biochemical and structural basis of peptide presentation to aid the discovery of SARS-CoV-2 MHC-I peptides. We performed immunopeptidome predictions of SARS-CoV-2 proteins across 52 common MHC-I alleles identifying 658 high confidence peptides. Analysis of the resulting peptide-allele assignment distribution demonstrated significant variation across the allele panel up to an order of magnitude. Using MHC-I population-based allele frequencies, we estimated the average SARS-CoV-2 peptide population binding capacity across 21 individual countries. We have discovered a strong inverse association between the predicted population SARS-CoV-2 peptide binding capacity and overall mortality. Furthermore, we found that the consideration of only structural proteins produced a stronger association with observed death rate, highlighting their importance in protein-targeted immune responses. The 108 predicted SARS-CoV-2 structural protein peptides were shown to be derived from enriched regions in the originating protein, and present minimal risk for disruption by mutation. These results suggest that the immunologic fitness of both individuals and populations to generate class I-restricted T cell immunity to SARS-CoV-2 infection may impact clinical outcome from viral infection.

For animated images and the manuscript, please click the thumbnails below