Conceived and designed the experiments: MR. Performed the experiments: GL NJ MR WD. Analyzed the data: CB DN JM GL NF DH MR. Contributed reagents/materials/analysis tools: DN DH WD VJ. Wrote the paper: BW CB JM NF MR.
The authors have declared that no competing interests exist.
While human immunodeficiency virus type 1 (HIV-1)-specific cytotoxic T lymphocytes preferentially target specific regions of the viral proteome, HIV-1 features that contribute to immune recognition are not well understood. One hypothesis is that similarities between HIV and human proteins influence the host immune response, i.e., resemblance between viral and host peptides could preclude reactivity against certain HIV epitopes.
We analyzed the extent of similarity between HIV-1 and the human proteome. Proteins from the HIV-1 B consensus sequence from 2001 were dissected into overlapping k-mers, which were then probed against a non-redundant database of the human proteome in order to identify segments of high similarity. We tested the relationship between HIV-1 similarity to host encoded peptides and immune recognition in HIV-infected individuals, and found that HIV immunogenicity could be partially modulated by the sequence similarity to the host proteome. ELISpot responses to peptides spanning the entire viral proteome evaluated in 314 individuals showed a trend indicating an inverse relationship between the similarity to the host proteome and the frequency of recognition. In addition, analysis of responses by a group of 30 HIV-infected individuals against 944 overlapping peptides representing a broad range of individual HIV-1B Nef variants, affirmed that the degree of similarity to the host was significantly lower for peptides with reactive epitopes than for those that were not recognized.
Our results suggest that antigenic motifs that are scarcely represented in human proteins might represent more immunogenic CTL targets not selected against in the host. This observation could provide guidance in the design of more effective HIV immunogens, as sequences devoid of host-like features might afford superior immune reactivity.
HIV-1-specific CD8+ T cell responses play a pivotal role in controlling HIV-1 replication, despite being rarely if ever able to fully contain viral replication
Considering the high number of potential epitopes, anti-HIV CTL responses tend to target a relatively limited set of epitopes in a distinct hierarchical pattern
As it is vital for the immune system to tolerate autologous structures for its proper regulation, any model of immunodominance should consider the potential impact of self/non-self discrimination. A breakdown in self-tolerance can lead to the onset of autoimmune diseases. It was proposed that self-tolerance might be infringed in autoimmunity by the activation of T cells directed against cryptic self-determinants
We hypothesize that dominance profiles in the HIV-specific CTL responses reflect an inverse relationship between the similarity of an HIV epitope to the host proteome and its immunogenicity. The underlying rationale is that the immune system preferentially responds to antigenic sequences that are never or only sporadically encountered in the repertoire of self-antigens. To better understand the reported dominance patterns among responses to HIV-1 antigens, we first analyzed the extent of similarity between HIV-1 and the human proteome. Then, we present the first comprehensive investigation of the relationship between similarity to host and the profile of immune responses elicited against the whole HIV proteome in 314 HIV-infected individuals and against 944 overlapping peptides representing a broad range of individual HIV-1B Nef variants in 30 individuals. In doing so, we validated our hypothesis that HIV immunogenicity could be partially regulated by similarities to the host proteome.
We first identified HIV-1 segments with sequence identity (4-, 5- and 6-mers) or high similarity (9-mers) to human proteins by comparing all possible 4, 5, 6 or 9-mers derived from HIV-1 B consensus 2001 to the human proteome. Sequential overlapping k-mers were scanned using BLASTP
Besides previously described similarities with MHC molecules
We next focused our search on nonamers, the typical length of epitopes presented by HLA class I molecules, in order to identify the longest “human-like” motifs. We identified 16 HIV oligomers considered similar to human proteins since they differed by 3 standard deviations from the similarity expected with randomized nonamers (
Significant 9-mers consensus 2001 | Human Protein sequence | Human protein | HIV database seq. corresponding to human prot. seq. | HIV-I B consensus 2001 position | Identities (Positives) |
LHPVHAGPI | L | PTPRE | Gag 215–223 | 9/10 (10/10) | |
LHP | T cell leukemia Homeobox 1 | LHP | 8/10 | ||
EPFRDYVDRF | COPINE 5 | AAV71076 | Gag 291–300 | 8/9 | |
PFRDYVDR | AAV53244 | 8/9 | |||
PDCKTILKA | HPDCKTI | DELTEX 4 homolog | AAV49378 | Gag 328–336 | 7/7 |
NTPVFAIKKK | SP | HERV | P | Pol 209–221 | 11/13 |
MTKILEPFR | MTKILEP | Piwi-like 2 | AAM74596 | Pol 319–327 | 7/8 |
SON DNA Binding P. | AAK35877 | 6/9 (7/9) | |||
FKNLKTGKY | FKN | Str Spe Recognition P. | FKN | Pol 501–508 | 6/7 |
VNIVTDSQYA | VNI | HERV | VNI | Pol 648–657 | 9/10 |
VN | Thrombospondin 3 precursor | VNI | 7/10 | ||
YIEAEVIPA | GYIEA | unnamed P. | Pol 798–806 | 10/11 | |
EIEAEV | p53 inducibleP. | AAV49469 | 6/6 | ||
NWRSELYKY | peroxysomal acylcoA thioestherase | CAA64162 | Env 469–477 | 7/8 | |
AAG…TVW | 35/37 | Cancer associated RAK | Env 516–572 | 35/37 | |
DQGPQREPY | qRP–DQGPQR | carcinoma ass. P. | DQGPQR | Vpr 7–15 | 9/13 (10/13) |
DQGPQR | proline rich | 7/8 | |||
PKTACTNCY | SQPK | Zn finger P. | SQPK | Tat 18–26 | 9/11 (10/11) |
LAIVALVVA | ALVALVVA | Trypsin-domain P. | ALVALVVA | Vpu 7–15 | 9/10 |
LAIVAL | small inducible cytokine 28 | 7/8 | |||
EELLKTVRL | EELLK | nucleolar RNA associated P. | EELLK | Rev10–18 | 8/9 |
LLKTVRL | LLKTVRL | 10/11 | |||
TVRERMRRA | TTVR | Rhomboid P. | Nef 15–23 | 9/10 | |
IYSQKRQDI | TYSQK | Ig heavy chain variable region | TYSQK | Nef 101–109 | 8/9 |
The level of similarity to host proteins for these HIV nonamers differed by 3 standard deviations from the level of similarity found with randomized nonamers. Amino acid changes are in bold italics and lower case.
Finally, a spectrum of 182 HIV-1 B Env sequences dating from 1983 to 2005 was scanned for similarities to human proteins in order to identify whether HIV was adapting to become more or less host-like over time. Despite marked differences among individual HIV-1 sequences, there was no evidence of change in the degree of similarity to host proteins since the beginning of the widespread epidemic (data not shown).
To investigate whether the similarity of HIV to its host participates in shaping the antiviral CD8+ T cell responses, we compared the similarity to human proteins of 410 HIV-1 peptides spanning the whole viral proteome to their frequency of recognition measured by IFN-γ ELISpot assays in 314 HIV-infected individuals
410 peptides spanning the HIV-1 B consensus 2001 proteome were tested by ELISpot using PBMC from a cohort of 314 HIV-1 infected individuals. The vertical axis corresponds to the percentage of individuals who recognized a given peptide. The horizontal axis corresponds to the number of matches to the human proteome for each peptide. Matches were derived by dissecting the 410 peptides into overlapping 6-mers offset by one residue, and then scored against the RefSeq protein sequence database. Matches were normalized to account for the length of the starting peptide (ranging from 15–20 AA in length).
To lift the uncertainty in epitope specificity cast by using 18-mer peptides, 944 10mer peptides encompassing Nef were screened in 30 HIV-infected individuals using IFN-γ ELISpot assays. The 944 Nef 10-mers, akin to or slightly larger than the typical size of an HLA class I epitope, overlapped by 9 residues and represented a broad array of HIV-1 variants found in the population (Frahm et al., in preparation). 346 Nef peptides elicited a response in at least one patient while 598 did not. Peptides that were recognized had a significantly lower degree of similarity to human proteins than the 10-mers that were not reactive. Considering 9-mer matches (allowing 2 mismatches), the mean number of matches was 1.40 for the peptides eliciting a response compared to 3.16 for those that do not elicit any response (p = 0.0002). Similarly, for 5- and 6-mer exact matches, the mean number of matches were 55.75 and 3.09, respectively, for the peptides eliciting a response, compared to 107.51 and 6.02, respectively, for those that failed to elicit any response (p = 0.0002 and <0.0001, respectively). Since the peptides are overlapping, we did a cross-validation analysis to verify that overweighting the degree of similarity to humans or re-counting responses did not skew our results. We partitioned our data in 10 non-overlapping sets of peptides (each including from 88 to 102 peptides) and compared the similarity to human proteins to the ELISpot reactivity for each set. Although the p-values were affected (11 out of 30 were >0.10), ELISpot reactive peptides showed less similarity to human proteins than non-reactive ones.
Last, we analyzed the magnitude of ELISpot responses elicited by the Nef peptides stratified according to their similarity to their closest human peptides. The mean magnitudes (number of spot-forming cells) of the ELISpot responses were higher when the Nef peptides were more distant from their closest human peptides. When the Nef peptides had, respectively, 1, 2, 3 or 4 mismatches with their closest human peptides, the mean magnitudes were 0, 220.00, 237.99 and 654.18 respectively (Spearman correlation factor rho = 0.1911 and p-value = 3.2622e-09) (
The vertical axis corresponds to the mean number of spot-forming cells counted in ELISpot assays. The horizontal axis corresponds to Nef peptides that have 2, 3 or 4 differences with their closest human peptides.
The hypothesis of immunodominance being related to low similarity to self is beguiling but there are undoubtedly other concurring factors. Frahm et al. (2004) previously analyzed different parameters affecting a peptide's ability to elicit an immune response. ELISpot reactivity was favored by low peptide variability, low representation of forbidden AA (i.e., AA not generally found at the C-terminus of CD8+ T cell epitopes) and high proteasome cleavage likelihood scores. In addition, HIV regions enriched in CTL epitopes were shown to be more hydrophobic
Lastly, we asked whether disorder/order of the peptides played a role in HIV immunogenicity. Disordered regions present a low sequence complexity with many repetitive elements coupled with a biased amino acid content deprived of bulky hydrophobic amino acids (that typically form the cores of folded globular proteins) while they are enriched in alanine, arginine, glycine, glutamine, serine, proline, glutamate and lysine that results in highly charged surfaces
The vertical axis corresponds to the percentage of HIV-1 infected individuals that recognized each of the 410 peptides spanning the HIV-1 B consensus 2001 proteome. The horizontal axis corresponds to the disorder prediction score for each peptide, calculated using predictions of order/disorder made with the VSL1 predictor (PONDR®).
Our data highlight a likely influence of the similarity to human sequences in shaping the host's immune response to HIV-1. We found that HIV-1 infected individuals seldom recognized the most “human-like” peptides, while the peptides that were frequently recognized showed a low similarity to the host. The relationship between HIV-1B similarity to host proteins and its immunogenicity was evidenced in 2 ways by analyzing CTL responses against i) 410 consensus peptides representing the whole HIV-1 B proteome in a cohort of 314 individuals and ii) 944 variant Nef peptides in 30 individuals. Furthermore, the more a Nef peptide was different from its closest human peptide, the stronger the immune response it elicited. These results, stemming from consensus and circulating HIV variants, support HIV's degree of similarity to host as a mechanism contributing to the intrinsic hierarchical profile of HIV-specific CD8+ T cell responses seen across different cohorts and host ethnicities
Although the extent of similarity between HIV and human proteins is relatively limited, a variety of segments are shared between HIV and host proteins, particularly proteins involved in host immunity, underlining a potential role of HIV-1's similarity to host in the interference with effectors of the immune system. For example, it could be expected that HIV/HERV mimics would have reduced immunogenicity, as negative selection in the host should have largely eliminated reactive T cell populations. Indeed, HERV expression was detected in the thymus, where immune tolerance toward self is maintained via central tolerance; thymocytes with a high affinity for self-antigens are deleted, thus the remaining thymocytes (the future CD4+/CD8+ T cells) are likely to have a low affinity for HERV-like antigens. It should be noted that high similarity to HERV was found outside of the reverse transcriptase active site (where similarities between enzymes are expected). In addition, frequent similarities between HIV and autoantigens suggest the possibility of cross-reactive responses through molecular mimicry. This may help explain why HIV patients appear to be more prone to develop autoantibodies, in particular against cardiolipin, ribonucleoproteins, smooth muscle, platelets or cryoglobulinemia
By showing a relationship between peptide similarity and immunodominance we validated low similarity to the host proteome as a concurring factor in the modulation of the pool of epitopic sequences, with a potential role in discriminating immunodominant from cryptic peptides. Nonetheless, it must be added that the molecular basis of immunogenicity is the outcome of numerous interacting factors, among which we studied structural disorder. Intrinsically disordered regions are protein segments which lack a fixed tertiary structure (i.e., they are partially or fully unfolded); they are involved in many biological functions including cell signaling, regulation, molecular recognition and other interactions with proteins and nucleic acids
Since it was recognized that some conformational structures are favored for HLA binding
These findings illustrate how numerous factors can intersect to establish an immunodominance hierarchy and show that high similarity to the host proteome hampers peptide immune reactivity. Thus, removing host mimics from vaccine constructs could be a crucial step toward designing not only safer but also more efficacious HIV vaccines. Due to the lack of viral peptides that are simultaneously similar to the host and strongly immunogenic, we suggest design of HIV vaccine candidates using non-self discrimination as a molding force in generating peptide immunogenicity. Crafting more potent vaccine candidates hinges upon the accurate understanding of the molecular mechanisms involved in peptide immunogenicity, and particularly upon a deeper insight into immunodominance.
Analyses of the degree of similarity between HIV-1 and the human proteome were conducted using the HIV-1 subtype B consensus sequence of 2001 available at the HIV immunology database (
IFN-γ ELISpot assays were performed using overlapping peptides, as previously described
Different parameters were analyzed for the 410 overlapping 18-mer-peptides. The average Shannon entropy scores for all peptides are available at the HIV immunology database (
Statistical analyses were done using JMP® version 5.1.2. Correlations were considered statistically significant when p was <0.05.
Randomization analysis of the frequency of recognition of HIV-1 B consensus 2001 peptides as a function of their similarity to human proteins. The values of the x axis (normalized similarity to the human proteome) were randomized. The Null distribution corresponds to the number of times when both x and y axes values were greater than their means plus 1 standard deviation. 1000 randomizations were performed. The number of occurrences (with both x and y axes values greater than their means plus 1 standard deviation) were counted and compared to the observed number of occurrences.