Assessment of software methods for estimating protein-protein relative binding affinities

PLoS ONE

Home Assessment of software methods for estimating protein-protein relative binding affinities

Tawny R. Gonzalez, Kyle P. Martin, Jonathan E. Barnes, Jagdish Suresh Patel, F. Marty Ytreberg

Competing Interests: The authors have declared that no competing interests exist.

https://doi.org/10.1371/journal.pone.0240573, Volume: 15, Issue: 12, Pages: 1-20

Article Type: Research Article Article History

Publisher: Public Library of Science

Altmetric

Table of Contents

Introduction
Methods
Results
Discussion
Conclusions
Supporting information

Abstract

A growing number of computational tools have been developed to accurately and rapidly predict the impact of amino acid mutations on protein-protein relative binding affinities. Such tools have many applications, for example, designing new drugs and studying evolutionary mechanisms. In the search for accuracy, many of these methods employ expensive yet rigorous molecular dynamics simulations. By contrast, non-rigorous methods use less exhaustive statistical mechanics, allowing for more efficient calculations. However, it is unclear if such methods retain enough accuracy to replace rigorous methods in binding affinity calculations. This trade-off between accuracy and computational expense makes it difficult to determine the best method for a particular system or study. Here, eight non-rigorous computational methods were assessed using eight antibody-antigen and eight non-antibody-antigen complexes for their ability to accurately predict relative binding affinities (ΔΔG) for 654 single mutations. In addition to assessing accuracy, we analyzed the CPU cost and performance for each method using a variety of physico-chemical structural features. This allowed us to posit scenarios in which each method may be best utilized. Most methods performed worse when applied to antibody-antigen complexes compared to non-antibody-antigen complexes. Rosetta-based JayZ and EasyE methods classified mutations as destabilizing (ΔΔG < -0.5 kcal/mol) with high (83–98%) accuracy and a relatively low computational cost for non-antibody-antigen complexes. Some of the most accurate results for antibody-antigen systems came from combining molecular dynamics with FoldX with a correlation coefficient (r) of 0.46, but this was also the most computationally expensive method. Overall, our results suggest these methods can be used to quickly and accurately predict stabilizing versus destabilizing mutations but are less accurate at predicting actual binding affinities. This study highlights the need for continued development of reliable, accessible, and reproducible methods for predicting binding affinities in antibody-antigen proteins and provides a recipe for using current methods.

Gonzalez,Martin,Barnes,Patel,Ytreberg,and Soares: Assessment of software methods for estimating protein-protein relative binding affinities

Introduction

Protein-protein binding is an essential physiological event that governs a large number of biological processes in the cell [1]. Amino acid mutations of these proteins can introduce diversity into genomes, and disrupt or modulate protein-protein interactions by changing the underlying binding free energy (ΔG, i.e. binding affinity), the amount of energy required to form protein complexes [2]. The binding free energy associated with a protein-protein complex determines the stability of the complex formation and the conditions for protein-protein association. Accurate prediction of binding free energies allows us to understand how these affinities can be modified, and leads to a more comprehensive understanding of protein interactions in living organisms [3].

Experimental biophysical methods can quantitatively measure change in the protein-protein binding free energy due to a mutation (i.e. relative binding affinity, ΔΔG), but these methods are typically costly, laborious, and time-consuming since all mutant proteins must be expressed and purified. Many researchers have developed and utilized computational methods to predict ΔΔG values for single- or multiple-amino acid mutations (see e.g. [4–6]). Historically, the most promising in terms of accuracy are rigorous methods based on statistical mechanics that use molecular dynamics (MD) simulations and thus automatically address conformational flexibility and entropic effects [7, 8]. However, these methods are computationally expensive since they employ rigorous sampling and utilize classical mechanics [9] or quantum mechanics [10] approximations of intermolecular interactions, and require a large number of calculations per time-step. Because of the expense, rigorous methods are not well-suited to studying large sets of mutations or large proteins thus necessitating less expensive, non-rigorous methods.

Non-rigorous high-throughput methods attempt to lower the computational cost, as compared to rigorous methods, while still providing accurate ΔΔG predictions. They accomplish this by including precalculated physico-chemical structural information in combination with predictive algorithms. The core mechanics that drive these methods fall under numerous classification umbrellas which have been covered by review articles [11, 12]. These review articles provide a broad overview but do not provide an unbiased, rigorous, comparative analysis outside of what the original developers provide. The developers of any given method tend to provide comparisons with other methods of the same general class to define where their method fits in the current landscape. BindProfX, for example, is available as a web server and standalone and utilizes structure-based interface profiles with pseudo counts. Upon release, it was most notably compared to FoldX (a semi-empirical trained method [13]) and DCOMPLEX (a physics-based method [14]) [15, 16]. iSEE, a statistically trained method based on 31 structure, evolution, and energy-based terms was tested against FoldX, BindProfX, and BeAtMuSiC (a machine learning-based approach [17]). Mutabind [18] and some other methods not explored in this work follow a similar testing methodology [19–21]. While these comparisons are beneficial in providing context for how a given model fits in the existing research landscape, they are not very robust, since only a narrow subset of methodologies are included. Conversely for folding stability, Kroncke et al. compared a large number of available software methods on a small dataset of transmembrane proteins providing a general overview of performance [6]. Despite the narrow dataset, this study provides a diverse, useful collection of evaluation metrics between multiple classes of methods. Our intent in this study is to provide a similar robust comparison of methods for non-rigorous binding affinity estimation.

In this work, we evaluate the ability of eight non-rigorous methods to predict relative binding affinities due to single amino acid mutations. We restrict our study to cases where both an experimental structure of the complex, and experimentally determined binding affinity values are available. To investigate the trade-off between speed and accuracy, we chose 16 protein-protein test complexes with empirical ΔΔG values for observed mutations. We calculated the ΔΔG values for each mutation using all eight methods and compared the results against empirical ΔΔG values. The goal of this study was to determine whether software methods that use (most costly) energy functions with a wider variety of physico-chemical structural features would provide more accurate binding affinity and interface destabilization predictions compared to those that rely on a single descriptive (less costly) energy function. We have determined scenarios in which some of these methods may be better or worse than traditional computational methods in predicting ΔΔG values.

Methods

Compilation of experimental ΔΔG values

To assess the performance of a range of protein-protein binding affinity prediction methods, we first assembled a dataset containing single amino acid mutations with known experimental ΔΔG values. This list was assembled from Structural Kinetic and Energetic database of Mutant Protein Interaction (SKEMPI) version 2.0 [22]. SKEMPI uses data from a variety of different biophysical measurement techniques; these are converted to ΔΔG values if not explicitly reported. Overall, the error associated with experimental ΔΔG values reported in the SKEMPI dataset is thought to range from 0.25 to 1 kcal/mol [22]. While generating this list, we considered four aspects: (i) type of protein-protein complex; (ii) availability of quality 3-D structural information; (iii) range of experimental ΔΔG values; and (iv) the type of mutations at differing sites on the complex. Our final dataset contained 654 mutations from 16 protein-protein complexes and their respective experimental ΔΔG values. We further categorized these 16 complexes as either non-antibody-antigen (non-Ab) or antibody-antigen (Ab). Table 1 shows the complexes in our dataset with their respective non-Ab and Ab categories and the number of mutations associated with each complex. The dataset contains a total of 401 non-Ab mutations and 253 Ab mutations.

Table 1

Dataset used in our study containing 16 protein complexes.

Non-Ab			Ab
PDB ID	# Mutations	# Residues	PDB ID	# Mutations	# Residues
1a4y [23]	32 [24–29]	583	1bj1 [30]	10 [31, 32]	547
1brs [33]	30 [34–37]	199	1jrh [38]	42 [39, 40]	540
1cbw [41]	31 [42, 43]	299	1mlc [44]	11 [45]	561
1iar [46]	36 [47]	336	1vfb [48]	48 [45, 49, 50]	352
1jtg [51]	37 [52–56]	428	1yy9 [57]	16 [45, 58]	1058
1lfd [59]	19 [60, 61]	254	2jel [62]	43 [63]	520
1ppf [64]	190 [65, 66]	274	3hfm [67]	71 [68–72]	558
2wpt [73]	26 [74–76]	220	4i77 [77]	12 [78]	549

For both non-Ab (left) and Ab (right) categories, columns show PDB IDs, total number of residues in a complex, and number of experimental mutants per complex.

Selection of protein-protein binding affinity methods

Binding affinity prediction methods were chosen to have both a distinct approach to binding affinity calculation that utilized 3-D structural information and had functional standalone software in September 2020, available either online or upon request to the author. Table 2 summarizes the methods selected in this study, their approaches, and their type of scoring functions. For simplicity, we categorized scoring functions (mathematical functions to calculate ΔΔG values) as semi-empirical, statistical, or physics-based. Semi-empirical methods replace as many calculations as possible with pre-calculated data and are trained using existing crystal structures and known binding affinity measurements for mutations [79]. Statistical methods use pre-calculated data and consider changes in coarse structural features such as the change in overall volume [80]. Physics-based methods use molecular mechanics based-energy functions to estimate enthalpic binding contributions [14]. In general, statistical or semi-empirical scoring functions involve a training step where existing datasets are leveraged to determine the weight of input parameters. MD, JayZ, and EasyE were not developed by training the methods against experimental data designed to improve predictive power while all other methods utilized this step.

Table 2

Methods used for comparison in study with a short summary of their approach and scoring function.

Name	Brief Description	Scoring Function	Runtime (CPU hours)
BindProfX [15, 16]	Interface profile score based on conservation of homologous interfaces	Semi-Empirical	1ppf = 0.57 CPUh
BindProfX [15, 16]		Semi-Empirical	1yy9 = 0.73 CPUh
BindProfX(BPX)+FoldX v4 [15, 16]	Profile score weighted and combined with FoldX energy potential	Semi-Empirical	1ppf = 0.62 CPUh
BindProfX(BPX)+FoldX v4 [15, 16]		Semi-Empirical	1yy9 = 0.71 CPUh
iSEE [81]	Random forest model using structural, evolutionary, and energy-based features	Statistical	1ppf < 0.01 CPUh
iSEE [81]		Statistical	1yy9 < 0.01 CPUh
DCOMPLEX v2 [14]	Structural ideal-gas reference state potential	Physics-Based	1ppf = 0.013 CPUh
DCOMPLEX v2 [14]	Structural ideal-gas reference state potential	Physics-Based	1yy9 = 0.001 CPUh
EasyE v1.0 [80, 82]	GMEC-based method utilizing the Rosetta [83, 84] energy function	Statistical	1ppf = 0.48 CPUh
EasyE v1.0 [80, 82]		Statistical	1yy9 = 0.09 CPUh
JayZ v1.0 [80, 82]	Partition-function method utilizing Rosetta energy function	Statistical	1ppf = 0.14 CPUh
JayZ v1.0 [80, 82]	Partition-function method utilizing Rosetta energy function	Statistical	1yy9 = 0.21 CPUh
FoldX v4 [13, 79]	Empirical energy score based on various energy parameters (e.g. van der Waals, solvation, electrostatics, hydrogen bonding)	Semi-Empirical	1ppf = 0.42 CPUh
FoldX v4 [13, 79]		Semi-Empirical	1yy9 = 0.16 CPUh
MD+FoldX v4 [85–87]	Molecular dynamics used to explore conformation space and generate snapshots; FoldX score calculated for each snapshot and averaged	Semi-Empirical	1ppf = 941 CPUh
MD+FoldX v4 [85–87]		Semi-Empirical	1yy9 = 4093 CPUh

Columns (left to right) indicate the method, a brief description of the method, the type of scoring function used, and runtimes. Runtimes are the amount of CPU hours for estimating the ΔΔG for a representative protein complex for Ab (1yy9, 1058 residues) and Non-Ab (1ppf, 274 residues) categories. Although 1yy9 is roughly four times bigger than 1ppf, the total runtime may or may not be affected depending on the method used.

Calculation and comparison of computational speed

The methods in Table 2 were used to predict ΔΔG values for each mutation on our experimental list shown in Table 1. Detailed protocols for predicting ΔΔG values using each selected method are provided in the (see S1 File). Runtimes were determined by using a representative protein complex from each category: 1ppf, a non-Ab complex with 274 total amino acids, and 1yy9, an Ab complex with 1058 total amino acids (see Table 2). These runtimes are estimates provided to give a point of comparison between the speeds of different methods. Specific runtimes will be determined by hardware specifications, method parameters, the number of mutations being computed, and overall protein size. For MD+FoldX, computational runtime was the length of time of the MD simulation plus the FoldX runtime for a single mutation. Reporting runtime in this fashion highlights the large CPUh requirement needed in order to add the sampling of MD into FoldX calculations. We note that, in contrast to the other methods tested here, the MD simulations that must be performed for MD+FoldX can be completed very quickly on modern GPUs, significantly offsetting the high initial cost of the MD+FoldX method. For all other methods, the algorithms rely either on various pre-calculated data or limited conformational sampling to calculate ΔΔG values rapidly.

Comparing experimental and predicted ΔΔG values

To carry out statistical analysis of our results we built an in-house Python script (see S2 File) that uses a combination of libraries including matplotlib, numpy, pandas, statistics, scipy, and sklearn. Using this script, we compared predicted values to experimental ΔΔG values for each method.

To evaluate the predictive ability of each method tested, we compared the following correlation coefficients using our script: concordance (⍴_c), Pearson (r), Kendall (τ), and Spearman (⍴) (see Table 3). We distinguish between methods that were trained to predict ΔΔG values from methods that compute metrics that are expected to linearly correlate with ΔΔG values. This distinction is important since for optimal performance we expect a regression line that passes through the coordinate origin and has a slope of 1, leading to a correlation coefficient equal to 1.

Table 3

Statistical measures used to test the performance of each method in predicting ΔΔG values.

Correlation	Brief Description	Type
Concordance	The concordance correlation coefficient (⍴_c) measures the degree to which the predicted ΔΔG value equals the actual experimental value (0 indicates no agreement and 1 perfect agreement).	Linear
Pearson	The Pearson correlation coefficient (r) measures the degree to which a uniform linear transformation of the predicted ΔΔG values (i.e., a shift and scale change) would yield the actual experimental values (0 indicates no agreement after transformation, 1 perfect agreement, and −1 perfect inverse agreement).	Linear
Kendall and Spearman	The rank correlation coefficient measures the degree to which the rank ordering of the predicted ΔΔG values matches the rank ordering of the actual experimental values (0 indicates no agreement after transformation, 1 perfect agreement, and −1 perfect inverse agreement). In a normal case, the Kendall correlation (τ) is considered more robust than the Spearman correlation (⍴) because of a smaller gross error sensitivity and more efficient due to a smaller asymptotic variance [88].	Rank
AUC and ROC	The receiver operating characteristic (ROC) curve tests several cutoff values for binning mutations as neutral or destabilizing between the most negative calculated ΔΔG value and the most positive calculated ΔΔG value, with true positive rates (sensitivity) calculated at each point. As the true positive rate is calculated, the classifier is moved to less extreme values; this yields the ROC curve. The area under curve (AUC) is a summary statistic that approximates how well the predictor actually discriminates between the two classifications.	N/A

To compare the discriminating power of the methods, we generated receiver operating characteristic (ROC) curves (see Table 3). These curves quantify the ability of a method to correctly classify point mutations as destabilizing (ΔΔG < −0.5 kcal/mol) or neutral/stabilizing (ΔΔG > −0.5 kcal/mol). ROC curves that are skewed toward a higher true positive rate (sensitivity) classify mutations more accurately, as quantified by area under curve (AUC, ranging between 1.0 and 0.5 for perfect and chance classification, respectively).

We also used our script to parse the results on the basis of several physico-chemical and structural features to allow us to evaluate the methods based on these characteristics: wild type amino acid type, mutant amino acid type, protein-protein interacting versus antibody-antigen, secondary structure classification of the mutation [89, 90], coordination number [91], Sneath index [92], mostly α-helical proteins versus mostly β-sheet proteins versus a mix of both α-helical and β-sheet proteins, percent exposure, location of the mutation, change in charge, change in polarity, change in volume, and whether or not the mutation location is predicted as an active or passive residue [93–95]. The script uses data from S3 File as an input and outputs scatter plots, correlation plots, receiver operating characteristic (ROC) curves, and box plots to visualize the data, as well as correlations and standard deviations for each method. All plots in this manuscript were generated using this script.

Results

The purpose of our study was to assess the ability of eight different relative binding affinity calculation methods (see Table 2) to estimate ΔΔG values. We selected 16 different protein complexes (eight Ab, eight non-Ab, see Table 1) with a total of 654 single amino acid mutations. Each method was then used to estimate ΔΔG values of 654 mutations and a variety of statistical measures were employed to assess their predictive ability. We also examined the computational speed of each method in the context of accuracy to determine its efficiency.

Non-antibody-antigen (non-Ab) results

Our dataset of eight non-Ab test protein complexes contains 401 total mutations and are mainly classified as protein-protein systems formed by inhibitors and receptors that range from 199 to 583 residues in size. The distribution and our classification of experimental ΔΔG values for all non-Ab test complexes is as follows: 13% of point mutations resulted in ΔΔG values less than -0.5 kcal/mol (classified as destabilizing); 31% between -0.5 and 0.5 kcal/mol (neutral); and 56% greater than 0.5 kcal/mol (stabilizing).

Figs 1 (blue data points and values) and 2 show various performance metrics for each method to assess their ability to predict the non-Ab ΔΔG values. Overall, EasyE has the highest correlation coefficient, r = 0.62, and iSEE has the lowest, r = 0.17 (see Figs 1 and 2). JayZ and EasyE, both of which utilize Rosetta’s conformational sampling algorithms, consistently have the best r values for non-Ab mutations.

Fig 1

Calculated ΔΔG values (x-axis) compared to experimental ΔΔG values (y-axis) for each method tested in this study.

Black, red, and blue lines are simple linear regressions from which r are derived. The red points are a scatter for Ab complexes and the blue points are for non-Ab complexes. The dashed line is the y = x line measuring perfect agreement between predicted and experimental ΔΔG values. The solid black, red, and blue lines indicate a linear relationship between calculated and experimental observations for all data points, Ab complexes, and non-Ab complexes respectively. The top values in black, red, and blue match the root-mean-square error and the bottom values indicate r for all values, Ab values, and non-Ab values respectively.

Fig 2

Performance of each method for non-Ab complexes (401 total mutations) in predicting true ΔΔG values (⍴_c), linearly correlated ΔΔG values (r), and rank order (⍴ and τ).

The error for each method is reported under the correlation points.

Fig 3 shows the ROC plot for all the tested methods. These ROC plots highlight how well a method can discriminate between stabilizing and destabilizing mutations. JayZ (0.84), EasyE (0.83), DCOMPLEX (0.82), FoldX (0.79), and MD+FoldX (0.76) have the highest AUC. Combined with the results from Figs 1 and 2, for the systems studied here, JayZ and EasyE methods are the best overall performers in terms of accuracy, discriminating stabilizing mutations from destabilizing, and ranking mutations based on their experimental ΔΔG values.

Fig 3

Receiver operating characteristic (ROC) curves for non-Ab complexes of the classification of variants as stabilizing (ΔΔG < -0.5 kcal/mol) or destabilizing (ΔΔG > 0.5 kcal/mol).

The values in the legend represent the area-under-curve (AUC). The higher the value, the better method is at discriminating between destabilizing and destabilizing mutations.

Table 2 reports CPUh required (i.e. runtimes) for each method to calculate ΔΔG for the entire list of mutations for a representative non-Ab protein complex. BindProfX, BindProfX(BPX)+FoldX, JayZ, and EasyE allow users to specify a list of mutations that the method is then able to calculate in one setting. This list can be optimized based on the available hardware to achieve efficiency. iSEE requires significant preparatory work (see S1 File) prior to calculation, but once completed, it calculates the ΔΔG values for the entire list of mutations nearly instantly. DCOMPLEX is not as flexible out of the box but can handle large numbers of mutations through an automated script. For MD+FoldX, 1yy9 (roughly four times larger than 1ppf) requires considerably more CPUh to calculate. All other methods calculate 1yy9 in a shorter time frame than 1ppf. This may seem counterintuitive. However, MD must statistically sample the conformational energy of the entire complex, while all other methods use algorithms that are likely impacted more by the number of residues involved in the interaction rather than the protein size. Overall, DCOMPLEX has a much faster runtime compared to other methods, and if the goal is to determine stabilizing and destabilizing non-Ab mutations, it offers similar discriminating power to JayZ and EasyE, at a fraction of the computational cost. JayZ estimates ΔΔG value of one mutation in ~2.7 s, EasyE in ~9.1 s, but DCOMPLEX requires just ~0.25 s. Overall, EasyE appears to be the best option for balancing accuracy and speed and DCOMPLEX is recommended for discriminating between stability and destabilizing mutations.

A method might not be a good overall performer in predicting ΔΔG values but could still perform well for mutations with certain physico-chemical and structural features. Therefore, we calculated various statistical measures to assess each method on unique subsets of mutations (see Table 4 and S1–S4 Figs). This table shows eight different data subsets with two r per method. EasyE has the highest r for non-Ab for five out of eight subsets (wild type non-gly or non-pro, alpha helix, beta sheet, surface exposure, and large volume changes). Where this method did not have the highest r, it had either the second or third highest r. JayZ mirrors the performance of EasyE in all the same categories and performs better than Easy in the neutral charge subset. These results further highlight the versatility of EasyE’s and JayZ’s performance in estimating the effects of non-Ab mutations compared to the other methods tested in this study. All methods apart from iSEE and BindProfX perform surprisingly well in the WT Gly or Pro subset. iSEE’s performance in this subset, while still mediocre compared to the other tested methods, is substantially better than in all other subsets.

Table 4

All methods r with respect to certain subsets.

Method	WT Gly or Pro	WT Non-Gly or Non-Pro	Alpha Helix	Beta Sheet	Surface Exposure	Neutral Charge	Hydrophobic to Polar	Large Vol Changes
BindProfX	Non-Ab: 0.08 Ab: -0.03	Non-Ab: 0.34 Ab: 0.33	Non-Ab: 0.29 Ab: 0.16	Non-Ab: 0.34 Ab: 0.48	Non-Ab: 0.22 Ab: 0.31	Non-Ab: 0.37 Ab: 0.45	Non-Ab: 0.33 Ab: 0.29	Non-Ab: 0.13 Ab: 0.38
BPX+FoldX	Non-Ab: 0.78 Ab: 0.09	Non-Ab: 0.46	Non-Ab: 0.43 Ab: 0.38	Non-Ab: 0.35	Non-Ab: 0.32	Non-Ab: 0.52	Non-Ab: 0.41	Non-Ab: 0.71
FoldX	Non-Ab: 0.83 Ab: -0.11	Non-Ab: 0.45 Ab: 0.25	Non-Ab: 0.39 Ab: 0.25	Non-Ab: -0.05 Ab: 0.31	Non-Ab: 0.50 Ab: 0.26	Non-Ab: 0.42 Ab: 0.41	Non-Ab: 0.41 Ab: 0.11	Non-Ab: 0.63 Ab: -0.32
MD+FoldX	Ab: 0.71	Non-Ab: 0.49 Ab: 0.42	Non-Ab: 0.44	Non-Ab: 0.08 Ab: 0.49	Non-Ab: 0.47 Ab: 0.35	Non-Ab: 0.46 Ab: 0.46	Ab: 0.31	Non-Ab: 0.71 Ab: 0.35
DCOMPLEX	Non-Ab: 0.65	Non-Ab: 0.34 Ab: 0.37	Non-Ab: 0.33 Ab: 0.31	Non-Ab: 0.22 Ab: 0.30	Non-Ab: 0.52 Ab: 0.27	Non-Ab: 0.36	Non-Ab: 0.38 Ab: 0.16	Non-Ab: 0.62 Ab: 0.28
JayZ	Non-Ab: 0.77 Ab: 0.54	Non-Ab: 0.49 Ab: 0.24	Non-Ab: 0.44 Ab: -0.06	Non-Ab: 0.30 Ab: 0.16	Non-Ab: 0.59 Ab: 0.36	Ab: 0.26	Non-Ab: 0.41 Ab: 0.01	Non-Ab: 0.83 Ab: 0.19
EasyE	Non-Ab: 0.78 Ab: 0.29	Ab: 0.22	Ab: 0.06	Ab: 0.03	Ab: 0.35	Non-Ab: 0.61 Ab: 0.23	Non-Ab: 0.45 Ab: 0.02	Ab: 0.18
iSEE	Non-Ab: 0.32 Ab: 0.43	Non-Ab: 0.29 Ab: -0.16	Non-Ab: 0.05 Ab: -0.04	Non-Ab: 0.14 Ab: -0.24	Non-Ab: 0.38 Ab: 0.11	Non-Ab: 0.15 Ab: -0.11	Non-Ab: 0.14 Ab: -0.05	Non-Ab: 0.24 Ab: -0.44

“WT Gly or Pro” are wild type amino acids that are either glycine or proline. “WT Non-Gly or Non-Pro” are wild type amino acids that are neither glycine nor proline. “Alpha Helix” are mutations that occur in a helix structure. “Beta Sheet” are mutations that occur in a beta structure. “Surface Exposure” are mutations that occur in an amino acid that have relative solvent accessibility values between 0 and 10%. “Neutral Charge” is a neutrally charged wild type amino acid mutating to a neutrally charged mutant amino acid. “Hydrophobic to Polar” is a hydrophobic or polar wild type amino acid mutating to a polar or hydrophobic mutant amino acid, respectively. “Larger Vol Changes” is a mutant amino acid that is greater than 40% larger than the wild type amino acid. Values that are bolded are the highest r for each method and protein type. Values that are red or blue are the highest r for each subset, blue for non-Ab and red for Ab.

Antibody-antigen (Ab) results

Our dataset of eight Ab test protein complexes contains 253 mutations and the proteins range in size from 352 to 1058 residues. The distribution and our classification of experimental ΔΔG values for all Ab test complexes is as follows: 5% of point mutations resulted in ΔΔG values less than -0.5 kcal/mol (classified as destabilizing); 40% between -0.5 and 0.5 kcal/mol (neutral); and 55% greater than 0.5 kcal/mol (stabilizing).

Figs 1 (data points and values in red), 4, and 5 show the performance of each method in predicting the ΔΔG values of Ab mutations. Overall, the highest correlation is for MD+FoldX with r = 0.39 and the lowest is iSEE with r = -0.09 (see Figs 1 and 4). An interesting trend is that the methods with the highest r values for non-Ab complexes do not have the highest r for Ab complexes.

Fig 4

Performance of each evaluated method for Ab complexes (253 total mutations) in predicting true ΔΔG values (⍴_c), linearly correlated ΔΔG values (r), and rank order (⍴ and τ).

The error for each method is reported under the correlation points.

Fig 5 shows the ROC plot for all the tested Ab methods. These ROC plots highlight how well a method is actually able to discriminate between stabilizing and destabilizing mutations. Compared to non-Ab complexes, all methods performed better for antibody-antigen complexes except for FoldX and DCOMPLEX which were marginally worse. JayZ (0.97), EasyE (0.98), FoldX (0.85), and MD+FoldX (0.82) had the highest AUC values. Combined with the results from Figs 1 and 4, at least for the systems studied here, it appears that the MD+FoldX method is the best overall performer in terms of accuracy, discriminating stabilizing mutations from destabilizing, and ranking mutations based on their experimental ΔΔG values.

Fig 5

Receiver operating characteristic curves of the classification of variants that are more destabilized or less destabilized than 0.5 kcal/mol.

The values in the legend represent the area-under-curve (AUC). The higher the value, the better the prediction capability of the method.

Compared to other methods, EasyE has a much faster runtime and is recommended if the goal is to discriminate between stabilizing and destabilizing (ΔΔG for one mutation takes ~21 s, see Table 2). By comparison, MD+FoldX cost ~941 CPUh for one mutation of 1yy9. DCOMPLEX provides a slightly lower r (0.31) and computational cost (~0.35 s) for one mutation of 1yy9. Overall, MD+FoldX appears to be the best option for accuracy and EasyE or JayZ are the best options for discriminating between destabilizing and stabilizing mutations.

Table 4 summarizes the ability of each method to predict ΔΔG values for subsets of Ab mutations. Most methods had mediocre r values less than 0.60. The exceptions to this are MD+FoldX and DCOMPLEX within the WT Gly or Pro subset with r = 0.71 and 0.89, respectively. BPX+FoldX has the highest r for Ab complexes for five of the eight subsets (WT nonGly or nonPro, beta sheet, surface exposure, neutral charge, hydrophobic to polar, and large volume changes) and performs equally well for the neutral charge subset as DCOMPLEX, which also has the highest r for WT Gly or Pro subset. For the beta sheet subset, MD+FoldX had the second highest r. In the surface exposure subset, JayZ and EasyE both had nearly identical r (0.36 and 0.35 respectively), the highest for this subset, but substantially worse than they did for non-Ab complexes.

Discussion

We assessed the performance of eight distinct protein-protein binding affinity calculation methods that use 3-D structural information. To test the performance of these methods, we selected 16 different protein complexes (see Table 1) with a total of 654 single amino acid mutations: eight antigen-antibody complexes (Ab, 253 mutations) and eight non-antigen-antibody (Non-Ab, 401 mutations) complexes. Each method was used to estimate ΔΔG values of the 654 mutations, a variety of statistical measures, CPU cost, and physico-chemical structural features to assess the performance. Our results suggest each method has both strengths and weaknesses depending on the properties of the protein system. Most methods did not perform well when applied to mutations in Ab complexes compared to non-Ab complexes. Rosetta-based JayZ and EasyE were able to classify mutations as destabilizing (ΔΔG < -0.5 kcal/mol) with high (83–98%) accuracy at relatively low computational cost. Some of the best results for Ab systems came from combining MD simulations with FoldX with a r coefficient of 0.39, but at the highest computational cost of all the tested methods.

Fig 1 summarizes the performance of each method in terms of its ability to estimate ΔΔG values for all (non-Ab + Ab) single mutations. None of the test methods show a very high r between experimental and predicted ΔΔG values. Two of the best performing methods, JayZ and EasyE, both have an r of 0.49 for all mutations, with a higher r of 0.61 and 0.62 respectively for non-Ab complexes. These results agree with published results from the authors of JayZ and EasyE. Our results agree moderately with published results from iSEE (they obtained r = 0.25, we obtained r = 0.17) and BindProfX (they used a much larger dataset). Published results for DCOMPLEX show a very good correlation of r = 0.87; much larger than what we obtained here. This difference is very likely due to the dataset size and compilation; DCOMPLEX was originally tested against 69 experimental data points, compared to the 654 values used here. MD+FoldX has an r of 0.39 for Ab complexes and appears to perform well for larger systems, which could indicate the importance of conformational sampling for antibody-antigen systems. Other methods used in this study have little to no conformational sampling which could explain their poor performance on Ab complexes. By contrast, these same methods perform well for non-Ab complexes, suggesting that conformational sampling is not the limiting factor to achieve accurate results for these protein complexes. For example, FoldX has a trained scoring function derived using a dataset of mostly non-Ab complexes and performs poorly for Ab complexes when using a single structure (see Table 2). However, when used with snapshots from an MD simulation, this same method outperforms all other methods selected in this study. This highlights the need for conformational sampling for reliable and efficient predictions of binding affinity for some systems. In our previous study, we combined coarse-grained forcefield with umbrella sampling to calculate ΔΔG values for eight mutations of 3hfm Ab complex (one of the test systems in this study) and obtained better predictions than FoldX and MD+FoldX [96]. This study further emphasizes the need for better conformational strategies for some systems. A rigorous endpoint free energy method could potentially be employed to overcome the conformational sampling problem. Endpoint methods typically use molecular mechanics force fields to generate conformational ensembles at the two states of interest. These ensembles are then evaluated with implicit solvent models such as molecular mechanics generalized Born surface area (MM/GBSA) and molecular mechanics Poisson–Boltzmann surface area (MM/PBSA) [97–99]. These methods are computationally less expensive than other rigorous approaches since simulations are only performed for two states, however their accuracy is system-dependent and sensitive to simulation protocols such as sampling strategy and entropy calculation. MM/PBSA and MM/GBSA have been successfully used by several groups to estimate ΔΔG values for a small number of protein complexes and recently reviewed by Wang E et al [98] and Wang C et al [99]. These studies obtained consistently higher overall correlation to experimental ΔΔG values, albeit for a small subset of mutations, compared to the methods tested in our study, but at the expense of significantly higher computational costs.

Statistical measures used to analyze performance are listed and defined in Table 3. For Ab, BPX+FoldX, MD+FoldX, and DCOMPLEX have the highest r values of the methods in our study (see Fig 4). MD+FoldX appears to be the most accurate method for Ab complexes. BindProfX, FoldX, JayZ, EasyE, and iSEE have low r and ⍴_c indicating that affinities estimated using these methods do not correlate well with experimental ΔΔG values using a linear transformation. Also, the τ and ⍴ were lower compared to MD+FoldX, indicating these methods do poorly at calculating a rank order that matches experimental data.

The ROC curves allow us to determine each method’s ability to classify mutations as either destabilizing or neutral/stabilizing (Figs 3 and 5). For non-Ab complexes, JayZ (0.84 AUC) and EasyE (0.83 AUC) have the best true positive rate followed by DCOMPLEX (0.82 AUC). For Ab complexes, JayZ (0.97 AUC) and EasyE (0.98 AUC) have better true positive rates than MD+FoldX, the method with the highest r value. If classification of destabilizing vs stabilizing is the primary need, then JayZ or EasyE are both recommended over the other methods tested here due to their high accuracy and fast runtime.

While accuracy is generally the main reason for choosing a particular method, computational efficiency is also an important consideration, especially when predicting the effects of a large number of mutations. Here, we discuss the performance of each method in terms of its trade-off between speed and accuracy for predicting ΔΔG values. For all single mutations and our non-Ab subset, EasyE and JayZ performed well; JayZ is the faster method of the two with EasyE at a similar speed to FoldX. DCOMPLEX is more accurate than FoldX for all single mutations and has similar accuracy as FoldX for non-Ab mutations, but at much lower cost. MD+FoldX has similar accuracy to DCOMPLEX for all single mutations and has similar accuracy to FoldX in non-Ab mutations but is by far the most computationally expensive method we tested. Although a synergistic combination of BPX+FoldX implements several structural and physico-chemical interaction terms in its algorithm, computation time was longer than all but MD+FoldX without a concomitant improvement in r. We note that this method is perhaps the most accessible of those tested, due to the easy-to-use online server interface and accuracy that is similar to FoldX for most systems. BindProfX utilizes the same scoring profile as BPX+FoldX without the FoldX calculations. In this case, accuracy decreased while calculation speed remained similar to BPX+FoldX. iSEE, the least correlating method, employs the widest variety of information to obtain relative binding affinity predictions and is the fastest of all methods (not including the non-trivial preparation time). For Ab complexes, MD+FoldX, the slowest of all the methods, had the highest accuracy, followed by DCOMPLEX. iSEE is again the fastest of all methods but also the least accurate. BindProfX utilizes several pre-calculated physico-chemical structural data in its scoring function while, JayZ and EasyE each layer an additional predictive calculating feature on top of Rosetta’s backbone sampling, adding complexity to the predictive algorithms. However, all three have similar r yet they do not achieve the accuracy of MD+FoldX. Overall, for non-Ab complexes, EasyE and JayZ appear to have the best balance between speed and accuracy of the methods we tested. For Ab complexes, DCOMPLEX appears to have the best balance.

We have demonstrated that all the tested methods have specific strengths and weaknesses; some perform better in specific contexts (Table 4), and some have longer runtimes to obtain similar predictive power to comparably faster methods. This highlights the complexity of the physico-chemical properties and structural features that drive, and limit, these predictive models. Moreover, our study highlights the need to separately evaluate the performance of future ΔΔG predictors for both Ab and non-Ab complexes. There is also a need for a much larger training dataset of experimentally measured binding affinities for both types of complexes. New binding affinity calculation approaches are also needed to properly account for the contribution of bridging water molecules that are often present at the protein-protein interface. Our results can be used to make informed decisions for methods that may be preferable for a particular study or system. Table 4 suggests that if the goal is to estimate only the order of magnitude or sign of relative binding affinities, then the preferred method will likely be very different than if the goal is to obtain the best possible accuracy for antibody-antigen systems. To improve accessibility, we have generated an in-house Python script (provided in the supplement with the full dataset used in this work) that can be used to parse any of the parameters used in this study and provide tailored information. This information in combination with the runtime and other details provided in this study can be used to inform users on methods that can provide the best accuracy and efficiency for a given protein-protein complex type, set of physico-chemical features or structural parameters, and set of mutations. Additionally, the script can be extended to other methods and feature-sets, potentially elucidating specific problems or areas of improvement to existing and future methods.

Conclusions

In this study, we have assessed the accuracy and efficiency of eight computational methods on predicting binding affinity changes due to single amino acid mutations. Methods were tested on 16 different protein complexes: eight antigen-antibody (Ab) and eight non-antigen-antibody (Non-Ab) complexes. While some methods perform consistently better than others, how well each performs depends on the physico-chemical and structural components of each complex. EasyE was the most accurate for non-Ab complexes, and MD+FoldX was most accurate for Ab complexes. JayZ and EasyE were better able to distinguish between destabilizing (ΔΔG > 0.5 kcal/mol) and stabilizing (ΔΔG < -0.5 kcal/mol) as compared to any other method. Future work could include more systems or different methods, including those that are solely web server-based in order to expand and better refine our conclusions on their predictive capability.

References

SJones, JMThornton. Principles of protein-protein interactions. Proceedings of the National Academy of Sciences. 1996;93(1):13–20. 10.1073/pnas.93.1.13

CMYates, MJESternberg. The Effects of Non-Synonymous Single Nucleotide Polymorphisms (nsSNPs) on Protein–Protein Interactions. Journal of Molecular Biology. 2013;425(21):3949–63. 10.1016/j.jmb.2013.07.012

MBaaden, SJMarrink. Coarse-grain modelling of protein-protein interactions. Curr Opin Struct Biol. 2013;23(6):878–86. 10.1016/j.sbi.2013.09.004 .

IEzkurdia, LBartoli, PFariselli, RCasadio, AValencia, MLTress. Progress and challenges in predicting protein-protein interaction sites. Briefings in bioinformatics. 2009;10(3):233–46. 10.1093/bib/bbp021 .

PLKastritis, AMBonvin. Are scoring functions in protein-protein docking ready to predict interactomes? Clues from a novel binding affinity benchmark. J Proteome Res. 2010;9(5):2216–25. 10.1021/pr9009854 .

BMKroncke, AMDuran, JLMendenhall, JMeiler, JDBlume, CRSanders. Documentation of an Imperative To Improve Methods for Predicting Membrane Protein Stability. Biochemistry. 2016;55(36):5002–9. 10.1021/acs.biochem.6b00537

RCBernardi, MCMelo, KSchulten. Enhanced sampling techniques in molecular dynamics simulations of biological systems. Biochim Biophys Acta. 2015;1850(5):872–7. 10.1016/j.bbagen.2014.10.019

VSpiwok, ZSucur, PHosek. Enhanced sampling techniques in biomolecular simulations. Biotechnol Adv. 2015;33(6 Pt 2):1130–40. 10.1016/j.biotechadv.2014.11.011 .

JCGumbart, BRoux, CChipot. Efficient Determination of Protein–Protein Standard Binding Free Energies from First Principles. J Chem Theory Comput. 2013;9(8):3789–98. 10.1021/ct400273t

PPokorná, HKruse, MKrepl, JŠponer. QM/MM Calculations on Protein–RNA Complexes: Understanding Limitations of Classical MD Simulations and Search for Reliable Cost-Effective QM Methods. J Chem Theory Comput. 2018;14(10):5419–33. 10.1021/acs.jctc.8b00670

CGeng, LCXue, JRoel-Touris, Bonvin AMJJ. Finding the ΔΔG spot: Are predictors of binding affinity changes upon mutations in protein–protein interactions ready for it? WIREs Computational Molecular Science. 2019;9(5):e1410.

MMGromiha, KYugandhar, SJemimah. Protein-protein interactions: scoring schemes and binding affinity. Curr Opin Struct Biol. 2016;44:31–8. 10.1016/j.sbi.2016.10.016 .

JSchymkowitz, JBorg, FStricher, RNys, FRousseau, LSerrano. The FoldX web server: an online force field. Nucleic Acids Res. 2005;33(Web Server issue):W382–8. 10.1093/nar/gki387

SLiu, CZhang, HZhou, YZhou. A physical reference state unifies the structure-derived potential of mean force for protein folding and binding. Proteins: Structure, Function, and Bioinformatics. 2004;56(1):93–101.

PXiong, CZhang, WZheng, YZhang. BindProfX: Assessing Mutation-Induced Binding Affinity Change by Protein Interface Profiles with Pseudo-Counts. J Mol Biol. 2017;429(3):426–34. 10.1016/j.jmb.2016.11.022

JRBrender, YZhang. Predicting the Effect of Mutations on Protein-Protein Binding Interactions through Structure-Based Interface Profiles. PLoS Comp Biol. 2015;11(10):e1004494 10.1371/journal.pcbi.1004494

YDehouck, JMKwasigroch, MRooman, DGilis. BeAtMuSiC: Prediction of changes in protein-protein binding affinity on mutations. Nucleic Acids Res. 2013;41(Web Server issue):W333–9. 10.1093/nar/gkt450

MLi, FLSimonetti, AGoncearenco, ARPanchenko. MutaBind estimates and interprets the effects of sequence variants on protein–protein interactions. Nucleic Acids Res. 2016;44(W1):W494–W501. 10.1093/nar/gkw374

TVreven, HHwang, BGPierce, ZWeng. Prediction of protein–protein binding free energies. Protein Sci. 2012;21(3):396–404. 10.1002/pro.2027

CHMRodrigues, YMyung, DEVPires, DBAscher. mCSM-PPI2: predicting the effects of mutations on protein–protein interactions. Nucleic Acids Res. 2019;47(W1):W338–W44. 10.1093/nar/gkz383

SJemimah, MSekijima, MMGromiha. ProAffiMuSeq: sequence-based method to predict the binding free energy change of protein–protein complexes upon mutation using functional classification. Bioinformatics. 2019;36(6):1725–30.

JJankauskaitė, BJiménez-García, JDapkūnas, JFernández-Recio, IHMoal. SKEMPI 2.0: an updated benchmark of changes in protein–protein binding energy, kinetics and thermodynamics upon mutation. Bioinformatics. 2018;35(3):462–9.

ACPapageorgiou, RShapiro, KRAcharya. Molecular recognition of human angiogenin by placental ribonuclease inhibitor—an X-ray crystallographic study at 2.0 Å resolution. The EMBO Journal. 1997;16(17):5162–77. 10.1093/emboj/16.17.5162

CZChen, RShapiro. Superadditive and subadditive effects of "hot spot" mutations within the interfaces of placental ribonuclease inhibitor with angiogenin and ribonuclease A. Biochemistry. 1999;38(29):9273–85. 10.1021/bi990762a .

RShapiro, MRuiz-Gutierrez, CZChen. Analysis of the interactions of human ribonuclease inhibitor with angiogenin and ribonuclease A by mutagenesis: importance of inhibitor residues inside versus outside the C-terminal "hot spot". J Mol Biol. 2000;302(2):497–519. 10.1006/jmbi.2000.4075 .

RShapiro, BLVallee. Identification of functional arginines in human angiogenin by site-directed mutagenesis. Biochemistry. 1992;31(49):12477–85. 10.1021/bi00164a026 .

RShapiro, BLVallee. Site-directed mutagenesis of histidine-13 and histidine-114 of human angiogenin. Alanine derivatives inhibit angiogenin-induced angiogenesis. Biochemistry. 1989;28(18):7401–8. 10.1021/bi00444a038 .

FSLee, BLVallee. Binding of placental ribonuclease inhibitor to the active site of angiogenin. Biochemistry. 1989;28(8):3556–61. 10.1021/bi00434a061 .

CZChen, RShapiro. Site-specific mutagenesis reveals differences in the structural bases for tight binding of RNase inhibitor to angiogenin and RNase A. Proc Natl Acad Sci U S A. 1997;94(5):1761–6. 10.1073/pnas.94.5.1761

YAMuller, YChen, HWChristinger, BLi, BCCunningham, HBLowman, et al VEGF and the Fab fragment of a humanized neutralizing antibody: crystal structure of the complex at 2.4 å resolution and mutational analysis of the interface. Structure. 1998;6(9):1153–67. 10.1016/s0969-2126(98)00116-6

YChen, CWiesmann, GFuh, BLi, HWChristinger, PMcKay, et al Selection and analysis of an optimized anti-VEGF antibody: crystal structure of an affinity-matured Fab in complex with antigen. J Mol Biol. 1999;293(4):865–81. 10.1006/jmbi.1999.3192 .

YAMuller, YChen, HWChristinger, BLi, BCCunningham, HBLowman, et al VEGF and the Fab fragment of a humanized neutralizing antibody: crystal structure of the complex at 2.4 A resolution and mutational analysis of the interface. Structure. 1998;6(9):1153–67. 10.1016/s0969-2126(98)00116-6 .

AMBuckle, GSchreiber, ARFersht. Protein-protein recognition: crystal structural analysis of a barnase-barstar complex at 2.0-A resolution. Biochemistry. 1994;33(30):8878–89. 10.1021/bi00196a004 .

RWHartley. Directed mutagenesis and barnase-barstar recognition. Biochemistry. 1993;32(23):5978–84. 10.1021/bi00074a008

GSchreiber, ARFersht. Energetics of protein-protein interactions: analysis of the barnase-barstar interface by single mutations and double mutant cycles. J Mol Biol. 1995;248(2):478–86. 10.1016/s0022-2836(95)80064-6 .

GSchreiber, ARFersht. Interaction of barnase with its polypeptide inhibitor barstar studied by protein engineering. Biochemistry. 1993;32(19):5145–50. 10.1021/bi00070a025 .

CFrisch, GSchreiber, CMJohnson, ARFersht. Thermodynamics of the interaction of barnase and barstar: changes in free energy versus changes in enthalpy on mutation. J Mol Biol. 1997;267(3):696–706. 10.1006/jmbi.1997.0892 .

SSogabe, FStuart, CHenke, ABridges, GWilliams, ABirch, et al Neutralizing epitopes on the extracellular interferon γ receptor (IFNγR) α-chain characterized by homolog scanning mutagenesis and X-ray crystal structure of the A6 Fab-IFNγR1-108 complex11Edited by R. Huber. J Mol Biol. 1997;273(4):882–97. 10.1006/jmbi.1997.1336

SLang, JXu, FStuart, RMThomas, JWVrijbloed, JARobinson. Analysis of antibody A6 binding to the extracellular interferon gamma receptor alpha-chain by alanine-scanning mutagenesis and random mutagenesis with phage display. Biochemistry. 2000;39(51):15674–85. 10.1021/bi000838z .

SSogabe, FStuart, CHenke, ABridges, GWilliams, ABirch, et al Neutralizing epitopes on the extracellular interferon gamma receptor (IFNgammaR) alpha-chain characterized by homolog scanning mutagenesis and X-ray crystal structure of the A6 fab-IFNgammaR1-108 complex. J Mol Biol. 1997;273(4):882–97. 10.1006/jmbi.1997.1336 .

AJScheidig, TRHynes, LAPelletier, JAWells, AAKossiakoff. Crystal structures of bovine chymotrypsin and trypsin complexed to the inhibitor domain of alzheimer's amyloid β-protein precursor (APPI) and basic pancreatic trypsin inhibitor (BPTI): Engineering of inhibitors with altered specificities. Protein Sci. 1997;6(9):1806–24. 10.1002/pro.5560060902

DKrowarsch, MDadlez, OBuczek, IKrokoszynska, AOSmalas, JOtlewski. Interscaffolding additivity: binding of P1 variants of bovine pancreatic trypsin inhibitor to four serine proteases. J Mol Biol. 1999;289(1):175–86. 10.1006/jmbi.1999.2757 .

MJCastro, SAnderson. Alanine point-mutations in the reactive region of bovine pancreatic trypsin inhibitor: effects on the kinetics and thermodynamics of binding to beta-trypsin and alpha-chymotrypsin. Biochemistry. 1996;35(35):11435–46. 10.1021/bi960515w .

BCBraden, HSouchon, J-LEiselé, GABentley, TNBhat, JNavaza, et al Three-dimensional structures of the free and the antigen-complexed Fab from monoclonal anti-lysozyme antibody D44.1. J Mol Biol. 1994;243(4):767–81. 10.1016/0022-2836(94)90046-9

SMLippow, KDWittrup, BTidor. Computational design of antibody-affinity improvement beyond in vivo maturation. Nat Biotechnol. 2007;25(10):1171–6. 10.1038/nbt1336

THage, WSebald, PReinemer. Crystal Structure of the Interleukin-4/Receptor α Chain Complex Reveals a Mosaic Binding Interface. Cell. 1999;97(2):271–81. 10.1016/s0092-8674(00)80736-9

YWang, BJShen, WSebald. A mixed-charge pair in human interleukin 4 dominates high-affinity interaction with the receptor alpha chain. Proc Natl Acad Sci U S A. 1997;94(5):1657–62. 10.1073/pnas.94.5.1657

TNBhat, GABentley, GBoulot, MIGreene, DTello, WDall'Acqua, et al Bound water molecules and conformational stabilization help mediate an antigen-antibody association. Proceedings of the National Academy of Sciences. 1994;91(3):1089–93. 10.1073/pnas.91.3.1089

WDall'Acqua, ERGoldman, EEisenstein, RAMariuzza. A mutational analysis of the binding of two different proteins to the same antibody. Biochemistry. 1996;35(30):9667–76. 10.1021/bi960819i .

WDall'Acqua, ERGoldman, WLin, CTeng, DTsuchiya, HLi, et al A mutational analysis of binding interactions in an antigen-antibody protein-protein complex. Biochemistry. 1998;37(22):7981–91. 10.1021/bi980148j .

DLim, HUPark, LDe Castro, SGKang, HSLee, SJensen, et al Crystal structure and kinetic analysis of β-lactamase inhibitor protein-II in complex with TEM-1 β-lactamase. Nat Struct Biol. 2001;8(10):848–52. 10.1038/nsb1001-848

SAlbeck, RUnger, GSchreiber. Evaluation of direct and cooperative contributions towards the strength of buried hydrogen bonds and salt bridges. J Mol Biol. 2000;298(3):503–20. 10.1006/jmbi.2000.3656 .

TSelzer, SAlbeck, GSchreiber. Rational design of faster associating and tighter binding protein complexes. Nat Struct Biol. 2000;7(7):537–41. 10.1038/76744 .

ZZhang, TPalzkill. Determinants of binding affinity and specificity for the interaction of TEM-1 and SME-1 beta-lactamase with beta-lactamase inhibitory protein. J Biol Chem. 2003;278(46):45706–12. 10.1074/jbc.M308572200 .

DReichmann, ORahat, SAlbeck, RMeged, ODym, GSchreiber. The modular architecture of protein-protein binding interfaces. Proc Natl Acad Sci U S A. 2005;102(1):57–62. 10.1073/pnas.0407280102

DReichmann, MCohen, RAbramovich, ODym, DLim, NCStrynadka, et al Binding hot spots in the TEM1-BLIP interface in light of its modular architecture. J Mol Biol. 2007;365(3):663–79. 10.1016/j.jmb.2006.09.076 .

SLi, KRSchmitz, PDJeffrey, JJWWiltzius, PKussie, KMFerguson. Structural basis for inhibition of the epidermal growth factor receptor by cetuximab. Cancer Cell. 2005;7(4):301–11. 10.1016/j.ccr.2005.03.003

JNHaidar, WZhu, JLypowy, BGPierce, ABari, KPersaud, et al Backbone flexibility of CDR3 and immune recognition of antigens. J Mol Biol. 2014;426(7):1583–99. 10.1016/j.jmb.2013.12.024 .

LHuang, FHofer, GSMartin, S-HKim. Structural basis for the interaction of Ras with RaIGDS. Nat Struct Biol. 1998;5(6):422–6. 10.1038/nsb0698-422

CKiel, TSelzer, YShaul, GSchreiber, CHerrmann. Electrostatically optimized Ras-binding Ral guanine dissociation stimulator mutants increase the rate of association by stabilizing the encounter complex. Proc Natl Acad Sci U S A. 2004;101(25):9223–8. 10.1073/pnas.0401160101

CKiel, LSerrano, CHerrmann. A detailed thermodynamic analysis of ras/effector complex interfaces. J Mol Biol. 2004;340(5):1039–58. 10.1016/j.jmb.2004.05.050 .

LPrasad, EBWaygood, JSLee, LTJDelbaere. The 2.5 Å resolution structure of the Jel42 Fab fragment/HPr complex11Edited by I. A. Wilson. J Mol Biol. 1998;280(5):829–45. 10.1006/jmbi.1998.1888

SSharma, FGeorges, LTDelbaere, JSLee, REKlevit, EBWaygood. Epitope mapping by mutagenesis distinguishes between the two tertiary structures of the histidine-containing protein HPr. Proc Natl Acad Sci U S A. 1991;88(11):4877–81. 10.1073/pnas.88.11.4877

WBode, AZWei, RHuber, EMeyer, JTravis, SNeumann. X-ray crystal structure of the complex of human leukocyte elastase (PMN elastase) and the third domain of the turkey ovomucoid inhibitor. EMBO J. 1986;5(10):2453–8.

SMLu, WLu, MAQasim, SAnderson, IApostol, WArdelt, et al Predicting the reactivity of proteins from their sequence alone: Kazal family of protein inhibitors of serine proteinases. Proc Natl Acad Sci U S A. 2001;98(4):1410–5. 10.1073/pnas.031581398

WLu, IApostol, MAQasim, NWarne, RWynn, WLZhang, et al Binding of amino acid side-chains to S1 cavities of serine proteinases. J Mol Biol. 1997;266(2):441–61. 10.1006/jmbi.1996.0781 .

EAPadlan, EWSilverton, SSheriff, GHCohen, SJSmith-Gill, DRDavies. Structure of an antibody-antigen complex: crystal structure of the HyHEL-10 Fab-lysozyme complex. Proc Natl Acad Sci U S A. 1989;86(15):5938–42. 10.1073/pnas.86.15.5938

JPons, ARajpal, JFKirsch. Energetic analysis of an antigen/antibody interface: alanine scanning mutagenesis and double mutant cycles on the HyHEL-10/lysozyme interaction. Protein Sci. 1999;8(5):958–68. 10.1110/ps.8.5.958

KTsumoto, KOgasahara, YUeda, KWatanabe, KYutani, IKumagai. Role of Tyr residues in the contact region of anti-lysozyme monoclonal antibody HyHEL10 for antigen binding. J Biol Chem. 1995;270(31):18551–7. 10.1074/jbc.270.31.18551 .

LNKam-Morgan, SJSmith-Gill, MGTaylor, LZhang, ACWilson, JFKirsch. High-resolution mapping of the HyHEL-10 epitope of chicken lysozyme by site-directed mutagenesis. Proc Natl Acad Sci U S A. 1993;90(9):3958–62. 10.1073/pnas.90.9.3958

MGTaylor, ARajpal, JFKirsch. Kinetic epitope mapping of the chicken lysozyme.HyHEL-10 Fab complex: delineation of docking trajectories. Protein Sci. 1998;7(9):1857–67. 10.1002/pro.5560070902

ARajpal, MGTaylor, JFKirsch. Quantitative evaluation of the chicken lysozyme epitope in the HyHEL-10 Fab complex: free energies and kinetics. Protein Sci. 1998;7(9):1868–74. 10.1002/pro.5560070903

NAGMeenan, ASharma, SJFleishman, CJMacDonald, BMorel, RBoetzel, et al The structural and energetic basis for high selectivity in a high-affinity protein-protein interaction. Proceedings of the National Academy of Sciences. 2010;107(22):10080–5.

AHKeeble, LAJoachimiak, MJMaté, NMeenan, NKirkpatrick, DBaker, et al Experimental and computational analyses of the energetic basis for dual recognition of immunity proteins by colicin endonucleases. J Mol Biol. 2008;379(4):745–59. 10.1016/j.jmb.2008.03.055 .

MSokolovski, JCveticanin, DHayoun, IKorobko, MSharon, AHorovitz. Measuring inter-protein pairwise interaction energies from a single native mass spectrum by double-mutant cycle analysis. Nature communications. 2017;8(1):212 10.1038/s41467-017-00285-1

WLi, SJHamill, AMHemmings, GRMoore, RJames, CKleanthous. Dual recognition and the role of specificity-determining residues in colicin E9 DNase-immunity protein interactions. Biochemistry. 1998;37(34):11771–9. 10.1021/bi9808621 .

MUltsch, JBevers, GNakamura, RVandlen, RFKelley, LCWu, et al Structural Basis of Signaling Blockade by Anti-IL-13 Antibody Lebrikizumab. J Mol Biol. 2013;425(8):1330–9. 10.1016/j.jmb.2013.01.024

MUltsch, JBevers, GNakamura, RVandlen, RFKelley, LCWu, et al Structural basis of signaling blockade by anti-IL-13 antibody Lebrikizumab. J Mol Biol. 2013;425(8):1330–9. 10.1016/j.jmb.2013.01.024 .

RGuerois, JENielsen, LSerrano. Predicting Changes in the Stability of Proteins and Protein Complexes: A Study of More Than 1000 Mutations. J Mol Biol. 2002;320(2):369–87. 10.1016/S0022-2836(02)00442-4

CViricel, Sde Givry, TSchiex, SBarbe. Cost function network-based design of protein–protein interactions: predicting changes in binding affinity. Bioinformatics. 2018;34(15):2581–9. 10.1093/bioinformatics/bty092

CGeng, AVangone, GEFolkers, LCXue, AMJJBonvin. iSEE: Interface structure, evolution, and energy-based machine learning predictor of binding affinity changes upon mutations. Proteins: Structure, Function, and Bioinformatics. 2019;87(2):110–9.

BHurley, BO’Sullivan, DAllouche, GKatsirelos, TSchiex, MZytnicki, et al Multi-language evaluation of exact solvers in graphical model discrete optimization. Constraints. 2016;21(3):413–34.

RFAlford, ALeaver-Fay, JRJeliazkov, MJO’Meara, FPDiMaio, HPark, et al The Rosetta All-Atom Energy Function for Macromolecular Modeling and Design. J Chem Theory Comput. 2017;13(6):3031–48. 10.1021/acs.jctc.7b00125

HPark, PBradley, PGreisen, YLiu, VKMulligan, DEKim, et al Simultaneous Optimization of Biomolecular Energy Functions on Features from Small Molecules and Macromolecules. J Chem Theory Comput. 2016;12(12):6201–12. 10.1021/acs.jctc.6b00819

CRMiller, ELJohnson, AZBurke, KPMartin, TAMiura, HAWichman, et al Initiating a watch list for Ebola virus antibody escape mutations. PeerJ. 2016;4:e1674 10.7717/peerj.1674

JSPatel, CJQuates, ELJohnson, FMYtreberg. Expanding the watch list for potential Ebola virus antibody escape mutations. PloS one. 2019;14(3):e0211093 10.1371/journal.pone.0211093

JYang, NNaik, JSPatel, CSWylie, WGu, JHuang, et al Predicting the viability of beta-lactamase: How folding and binding free energies correlate with beta-lactamase fitness. PloS one. 2020;15(5):e0233509 10.1371/journal.pone.0233509

CCroux, CDehon. Influence functions of the Spearman and Kendall correlation measures. Statistical Methods & Applications. 2010;19(4):497–515.

MZTien, AGMeyer, DKSydykova, SJSpielman, COWilke. Maximum Allowed Solvent Accessibilites of Residues in Proteins. PloS one. 2013;8(11):e80635 10.1371/journal.pone.0080635

WKabsch, CSander. Dictionary of protein secondary structure: Pattern recognition of hydrogen-bonded and geometrical features. Biopolymers. 1983;22(12):2577–637. 10.1002/bip.360221211

GATribello, MBonomi, DBranduardi, CCamilloni, GBussi. PLUMED 2: New feathers for an old bird. Comput Phys Commun. 2014;185(2):604–13.

PHASneath. Relations between chemical structure and biological activity in peptides. J Theor Biol. 1966;12(2):157–95. 10.1016/0022-5193(66)90112-3

GCPvan Zundert, JPGLMRodrigues, MTrellet, CSchmitz, PLKastritis, EKaraca, et al The HADDOCK2.2 Web Server: User-Friendly Integrative Modeling of Biomolecular Complexes. J Mol Biol. 2016;428(4):720–5. 10.1016/j.jmb.2015.09.014

CDominguez, RBoelens, AMJJBonvin. HADDOCK: A Protein−Protein Docking Approach Based on Biochemical or Biophysical Information. J Am Chem Soc. 2003;125(7):1731–7. 10.1021/ja026939x

TAWassenaar, Mvan Dijk, NLoureiro-Ferreira, Gvan der Schot, SJde Vries, CSchmitz, et al WeNMR: Structural Biology on the Grid. Journal of Grid Computing. 2012;10(4):743–67.

JSPatel, FMYtreberg. Fast Calculation of Protein–Protein Binding Free Energies Using Umbrella Sampling with a Coarse-Grained Model. J Chem Theory Comput. 2018;14(2):991–7. 10.1021/acs.jctc.7b00660

TSiebenmorgen, MZacharias. Computational prediction of protein–protein binding affinities. WIREs Computational Molecular Science. 2020;10(3):e1448.

EWang, HSun, JWang, ZWang, HLiu, JZHZhang, et al End-Point Binding Free Energy Calculation with MM/PBSA and MM/GBSA: Strategies and Applications in Drug Design. Chem Rev. 2019;119(16):9478–508. 10.1021/acs.chemrev.9b00055 .

CWang, DGreene, LXiao, RQi, RLuo. Recent Developments and Applications of the MMPBSA Method. Frontiers in molecular biosciences. 2017;4:87 10.3389/fmolb.2017.00087