bcl_59448091.htm

Principal Investigator/Program Director (Last, first, middle): Donald, Bruce R.
Duke University and Duke University Medical Center.

Computational Structure-Based Protein Design.

Project Summary. Computational structure-based protein design is a transformative field with exciting prospects for advancing both basic science and translational medical research. My laboratory has developed new protein design algorithms and used them to design new drugs for leukemia, redesign an enzyme to diversify current antibiotics, design protein-peptide interactions to treat cystic fibrosis, design probes to isolate broadly neutralizing HIV antibodies, and predict MRSA resistance to new antibiotics. Central to protein design methodology is the need to optimize the amino acid sequence, placement of side chains, and backbone conformations in protein structures. By developing advanced search and scoring algorithms for combinatorial optimization of protein and ligand structure and sequence, we showed that desired structure, affinity, and activity can be designed by (a) modeling improved molecular flexibility and (b) exploiting ensembles of structures for accurate predictions. Our suite of algorithms has mathematical guarantees on the solution quality (up to the accuracy of the input model, which includes the initial structures, molecular flexibility to be modeled, and an empirical molecular mechanics energy function). Specifically, our algorithms guarantee to compute the global minimum energy conformation (GMEC), a gap-free list of sequences and structures in order of predicted energy, and a provably-good approximation to the binding affinity by bounding partition functions over molecular ensembles. We tested our algorithms prospectively, and experimental validation included construction of mutant proteins, measurement of binding affinity, enzyme kinetics and stability, crystal structures, NMR structures, viral neutralization, and in-cell activity.

We propose to build on our foundation of protein design algorithms, called OSPREY, and apply them in areas of biochemical and pharmacological importance. We will (1) predict future resistance mutations in protein targets of novel drugs; (2) design inhibitors of protein:protein interactions to target today’s “undruggable” proteins; and (3) use our design methodology to discover and improve broadly neutralizing HIV-1 antibodies. Improvements to our protein design algorithms will be implemented to improve accuracy and scope, and we will advance the state-of-the-art in protein design by making algorithmic and modeling improvements to accomplish the Aims (1-3) above, including: the modeling of more protein and ligand flexibility during design; new combinatorial optimization and energy-bounding methods to accelerate the design search; and design of affinity and specificity using novel positive and negative design algorithms that model thermodynamic molecular ensembles. We will test our design predictions prospectively, by making novel predicted mutant proteins and performing biochemical, biological, and structural studies. We will also validate our algorithms retrospectively, using existing structures and data. All software we develop will be released open-source.

Project Description

Page 6

Principal Investigator/Program Director (Last, first, middle): Donald, Bruce R.

1 Speciﬁc Aims

Computational structure-based protein design is a transformative ﬁeld that can advance both basic science and translational medical research. My laboratory has developed new protein design algorithms and used them to design new drugs for leukemia [54], to redesign an enzyme to diversify current antibiotics [1*], to design protein- peptide interactions to treat cystic ﬁbrosis [3*], to design probes to isolate broadly neutralizing HIV antibodies [6*], and to predict MRSA resistance to new antibiotics [2*]. Central to protein design methodology is the need to op- timize the amino acid sequence, placement of side chains, and backbone conformations in protein structures. By developing advanced search and scoring algorithms for combinatorial optimization of protein and ligand structure and sequence, we have shown that desired structure, afﬁnity, and activity can be designed by modeling improved molecular ﬂexibility and exploiting ensembles of structures for accurate predictions. Moreover, our suite of algo- rithms has mathematical guarantees on the solution quality (up to the accuracy of the input model, which includes the initial structures, molecular ﬂexibility to be modeled, and an empirical molecular mechanics energy function). Speciﬁcally, our algorithms guarantee to compute the global minimum energy conformation (GMEC), a gap-free list of sequences and structures in order of predicted energy, and a provably-good "-approximation to the binding afﬁnity by bounding partition functions over molecular ensembles. In the past grant period, we tested our algo- rithms prospectively by redesigning enzymes to catalyze reactions on novel substrates [1*], predicting resistance mutations to new antifolate inhibitors of MRSA DHFR [2*], designing probes to isolate broadly neutralizing anti- bodies [6*], and designing novel peptide inhibitors of protein:protein interactions (PPIs) that not only have greatly improved binding efﬁciency but also rescue chloride efﬂux in human epithelial airway cells containing the genetic ΔF508-CFTR cystic ﬁbrosis (CF) defect [3*].

We propose to build upon our foundation of protein design algorithms and apply them in three areas of bio- chemical and pharmacological importance. We will (1) predict future resistance mutations in protein targets of novel drugs; (2) design PPI inhibitors that target today’s “undruggable” proteins; and (3) use our design methodol- ogy to discover and improve broadly neutralizing HIV antibodies. Improvements to our protein design algorithms will improve their accuracy and scope. We will advance the state-of-the-art in protein design by making algorith- mic and modeling improvements to enable the applications above, including the modeling of more protein and ligand ﬂexibility during design; novel combinatorial optimization and energy-bounding methods to accelerate the design search; and design of afﬁnity and speciﬁcity using novel algorithms to model molecular ensembles. All of the software we develop will be released open-source. Our algorithms will be validated retrospectively using existing structures and data. We will also test our predictions prospectively by making novel predicted mutant proteins and performing biochemical, biological, and structural studies. Three aims that combine computational and experimental studies are proposed to accomplish these goals:

Aim 1: Drug resistance resulting from mutations to the target is a serious detrimental phenomenon that lim- its the lifetime of many of the most successful drugs. In contrast to the investigation of mutations after clinical exposure, it would be powerful to incorporate strategies early in the development process that enable the drug designer to anticipate and overcome possible resistance mutations. We will develop novel algorithms and soft- ware to predict resistance mutations in protein targets, before they arise in response to new drugs. By modeling backbone ﬂexibility during positive design and negative design, we will validate our algorithms on a number of systems including malaria, tuberculosis, cancer, inﬂuenza, and HIV. We will experimentally test our predictions for MRSA, Candida glabrata, and vancomycin-resistant Enterococcus (VRE) to measure enzyme activity, KD , Ki, and solve crystal structures of the predicted resistance mutants.

Aim 2: We will extend our algorithms and use them to design PPI inhibitors. To handle the large protein surface area that must be modeled during PPI design, we will improve the speed and efﬁciency of our algorithms. Selective PPIs will be designed to ﬁnd speciﬁc inhibitors of PDZ-domain interactions, to modulate protein traf- ﬁcking of the CFTR protein, which is mutated and mistrafﬁcked in cystic ﬁbrosis patients. Designing inhibitor speciﬁcity is a challenging problem (since PDZ domains bind a variety of peptides) that we will address using a novel positive and negative design approach.

Aim 3: We will design the protein:protein interactions of broadly neutralizing antibodies, such as VRC07, and their target, the HIV-1 glycoprotein gp120. By designing gp120-like proteins we will create probes that can distinguish between neutralizing and non-neutralizing antibodies. The interfaces of the discovered antibodies will be computationally designed to improve their potency and breadth. The improved antibodies will also be used as templates for therapeutic antibody derivatives, called nanobodies. The design algorithms and experiments will optimize the nanobodies to neutralize more broadly resistant HIV strains, which could lead to a passive immunization therapy for HIV. The nanobodies will (a) target a site of vulnerability on HIV envelope proteins deﬁned by structural studies of human HIV antibodies, (b) optimize avidity, and (c) overcome resistance mutations.

Specific Aims

Page 85

Principal Investigator/Program Director (Last, first, middle): Donald, Bruce R.

2 Research Strategy

2.1 Signiﬁcance

Protein Design. Technological advances in protein redesign could revolutionize therapeutic treatment. With these advances, proteins and other molecules can be designed to act on today’s undruggable proteins or tomorrow’s drug-resistant diseases. One of the most promising approaches in protein redesign is computational structure-based protein design (CSPD). CSPD algorithms model a protein’s three-dimensional structure and predict mutations to the native protein sequence that will have a desired effect on its biochemical properties and function, such as improving the afﬁnity of a drug-like protein for a disease target. In this project we will develop OSPREY (Open Source Protein Redesign for You), a free, open-source CSPD program, and we will apply it to biomedically important targets.

We have recently published results in methodology [4*, 5*, 10*, 11*] and prospective experimental studies [1*, 2*, 19*, 3*, 6*]. We also broadened our research to perform negative design [2*, 6*] (See Secs. 3.1, 3.3.1 and Figs. 4, 8), model backbone ﬂexibility [12*, 10*, 50] (Fig. 6), design protein cores [4*] (Fig. 5), resurface proteins (Fig. 9C-E), predict resistance mutations [2*] (Sec. 3.1), model ﬂexible inhibitors in active sites [1*, 2*, 3*] (Secs. 3.1,3.2, 3.3), design protein:protein and protein:peptide interactions [3*, 6*] (Secs. 3.3, 3.2.4B, and Figs. 7-9), design small-molecule allosteric inhibitors of protein:protein interactions [54] (Sec. 3.2.4A), and design distal mutations to stabilize redesigned enzymes [2*] (Sec. 3.2 and Fig. 3E). See http://www.cs.duke.edu/brd/papers/ for a list of these and other relevant publications.

In total, we report 20 refereed publications, including 2 in PNAS and 3 in PLoS Comp. Biol; papers supported by the grant are “starred” as references [1*-20*] in our grant application. Our publications detail progress in protein design with continuous rotamers [4*], backbone ﬂexibility [5*, 12*, 10*, 50], ensembles to predict binding afﬁnity [2*, 3*, 11*], protein:protein interactions [3*, 6*, 13*], and negative design [2*, 6*]. Experimental validation included construction of mutant proteins [1*, 2*], measurement of binding afﬁnity [3*], enzyme kinetics and stability [1*, 2*, 19*], crystal structures [2*], NMR structures [54], viral neutralization [6*], and in-cell activity [3*, 19*, 54].

Central to protein design methodology is the need to optimize the amino acid sequence, placement of side chains, and backbone conformations in protein structures. Protein design algorithms use simpliﬁed models of pro- tein geometry, ﬂexibility, and energetics in order to make the search over the vast combinatorial space of possible protein structures and sequences tractable. It is necessary to improve these models to more accurately evaluate protein:ligand interactions, and tackle more difﬁcult protein design problems. Improvements in modeling proteins must be balanced by algorithmic advances to make searching over more conformations tractable. This is particularly important when modeling more backbone, sidechain, and ligand ﬂexibility, where the size of the conformation space grows dramatically. To design for afﬁnity and speciﬁcity, a conformational ensemble of structures must be modeled to incorporate a measure of conformational entropy [109, 7*, 11*]. This is especially challenging when searching over a large combinatorial sequence space. Finally, protein design algorithms must be validated both by retrospective tests, and in prospective studies with experimental conﬁrmation. Our competing renewal application targets these goals, proposing the development of new algorithms, implementation and software, retrospective validation, and biomedi- cally important experimental studies. OSPREY will be developed into a general, open-source software tool, and used to perform empirical designs with the following signiﬁcance:

1)Antimicrobial resistance is a serious threat to human health. Pathogens quickly develop resistance to evade even the most reserved antibiotics. Microbes, fungi, and viruses develop escape mutations not only to vitiate enzyme inhibitors, but also to thwart binding and neutralization by antibodies. The essential enzyme dihydrofolate reductase (DHFR) is a promising drug target to combat infections from methicillin-resistant Staphylococcus aureus (MRSA), vancomycin-resistant Enterococcus (VRE), and Candida glabrata: MRSA is a dangerous superbug resulting in an estimate of over 19,000 deaths per year just in the U.S. [70]; VRE is an increasingly common nosocomial infection resistant against the antibiotic of “last resort” vancomycin; The fungus C. glabrata is responsible for a signiﬁcant number of bloodstream infections [90] due in part to its decreased susceptibility toward azole compounds, especially ﬂuconazole, as well as its resistance to Amphotericin B [89]. We will use positive and negative CSPD on the DHFR enzymes, to prospectively predict resistance mutations to new inhibitors, before clinical deployment. This should afford a substantial advantage to drug design in allowing pharmaceutical development to “look ahead” and anticipate the resistance mutations that the pathogen will deploy. To demonstrate the breadth and generality of our approach we will also retrospectively predict and validate resistance mutations for many other drug/enzyme pairs from different organisms.

2)Protein:protein interaction (PPI) inhibitors. The majority of the drugs discovered over the last century target only a small fraction of the proteome, which are typically proteins with a catalytic or small molecule binding site [110]. However, around 80% of proteins do not have a small-molecule binding site and interact mostly with other pro- teins [110]. These proteins have long been considered “undruggable” [110], and are crucially involved in the most challenging diseases, including HIV [33], cancer [39], neurological diseases [108], and cystic ﬁbrosis (CF) [56, 118].

Research Strategy

Page 86

Principal Investigator/Program Director (Last, first, middle): Donald, Bruce R.

PPIs are large, ﬂexible, and have energetically shallow binding surfaces, which makes targeting them with a small molecule inhibitor innately difﬁcult. However, recent techniques [115, 49, 86, 110, 117] enable the design of peptide- like molecules that can resist proteolytic degradation. We will use OSPREY to design these peptide-like drugs to inhibit disease-related PPIs; speciﬁcally, we will design inhibitors of PPIs involved in CF. Currently, most approved treatments for CF merely treat the resulting disease symptoms. In contrast, the molecules we design [3*] (Sec. 3.3.6) will be lead compounds for drugs that address the underlying defects in CF patients (see Sec. 3.2.1) and promise to have an additive effect with other compounds under development [37, 3*].

3)HIV-1 has infected more than 60 million people and killed more than 20 million. HIV-1’s exceptional ability to avoid the human immune system has made vaccine development a formidable challenge. However, some indi- viduals can elicit broadly-neutralizing monoclonal antibodies (BNAbs) as a response to natural infection. Given the existence of these BNAbs, the main challenge for the design of an effective vaccine is the elicitation of BNAbs in the general population through immunization [35]. Nevertheless, known BNAbs can provide valuable information about the mechanism of action of such antibodies against speciﬁc HIV strains and can enable further immunogen design efforts. Moreover, selected potent BNAbs can be used as a template for antibody derivatives to be used as therapeu- tics or for passive immunization. With no vaccine in sight for a number of pathogens that have a signiﬁcant impact on public health, a strategy that involves passive immunization with monoclonal BNAbs can play an important role in disease prevention. We propose to develop novel methodology and algorithms for computational structure-based antibody redesign and to apply this methodology to design pan-neutralizing antibodies for HIV-1 and other viruses. The recent structural characterization of several HIV-1 BNAbs and the ever increasing socio-economic burden HIV-1 places on our world, led us to select HIV-1 as the primary target for validating our efforts on designing pan-neutralizing antibodies, and antibody derivatives to be used as therapeutics or for passive immunization.

OSPREY CSPD Software. To date, OSPREY has been used to design new drugs for leukemia [54], to redesign an enzyme to diversify current antibiotics [1*], to design protein-peptide interactions to treat cystic ﬁbrosis [3*], to predict MRSA resistance to new antibiotics [2*], and to design probes to isolate anti-HIV BNAbs from donor sera of infected patients [6*]. To perform all of these biomedically-important designs, OSPREY has several software modules that accurately model protein movements during redesign, including discrete side-chain ﬂexibility (Rigid DEE), continuous side-chain ﬂexibility (MinDEE and iMinDEE), local and global backbone ﬂexibility (BRDEE and BD)

as well as combined continuous backbone and side-chain ﬂexibility (DEEPer) (Fig. 1). These modules can be used stand-alone to ﬁnd the global minimum energy conformation (GMEC) for a design, or together with the K module to generate provably-good ensemble-based binding afﬁnity approximations (Fig. 2). To enable the challenging designs for this grant, OSPREY will be extended in multiple ways (Secs. 3.1.4,3.2.3,3.3.4) including: improved backbone and loop modeling, modeling of explicit waters, incorporation of dynamics information, and speed and efﬁciency improvements. The power of protein design with OSPREY applied to biomedical systems opens up new opportunities in disease prevention, and will provide unique insights into today’s most challenging biomedical problems.

2.2Innovation

At the core of our methodology lie three fundamental principles that improve protein design: algorithms with mathematical accuracy (provability), modeling continuous ﬂexibility, and thermodynamic ensembles. These principles have advanced protein design via innovations in negative design, drug resistance, and stabilizing distal mutations. Our algorithms were experimentally demonstrated to have remarkable accuracy and predictive power (see [1*, 2*, 3*, 54, 6*] and Figs. 3,4,7,9,8) and they outcompeted expensive experimental techniques in a series of prospective protein designs. Below we describe the innovation of our proposal.

A. Provable algorithms enable accurate improvements to the model based on experimental data. CSPD pro- cedures rank-order the best solutions (the lowest-energy protein conformations) in sequence and conformational space. Inevitably, in all CSPD methods, inaccuracies in the input model (Fig. 1) lead to false positives and false neg- atives in the set of predictions. However, these unsuccessful designs are useful as feedback to the CSPD algorithm to improve the input model. Our algorithms are provable and complete, which means that they are guaranteed to ﬁnd the globally optimal solution with respect to the input model (Fig. 1B). Therefore, there are no inaccuracies in the search and any inaccuracies in the design can be directly attributed to the input model. In contrast, heuristic design algorithms [40, 71, 68, 78, 66, 73, 41, 121, 48, 91] use the same input model deﬁnition as provable algorithms, but employ a heuristic search of conformation space and hence cannot guarantee they ﬁnd the globally optimal solu- tion. When a heuristic (as opposed to provable) design algorithm is used, we have no idea whether the prediction inaccuracies should be attributed to inadequate optimization, or to ﬂaws in the input model. If an energy function is overﬁt to handle spurious predictions (artifacts of the algorithm, not of the model) then the process of improving the model will fail, and the additional experimental data will be misused to effectively “break” a good model to overﬁt the putative outliers. Hence, with heuristic design algorithms the operation of “tuning” the model to match the data is fundamentally unsound. On the other hand, with a provable algorithm, one variable is removed from the exper- iment (the algorithm), and we know that any discrepancies between the experimental results and the algorithm’s predictions must be blamed solely on inadequacies of the model (and not the algorithm). In this manner, by using

Research Strategy

Page 87

Principal Investigator/Program Director (Last, first, middle): Donald, Bruce R.

provable algorithms it is substantially more straightforward to improve the model given new experimental data, and these improvements or “tuning” are sound, using our optimization protocol in [3*].

B. Continuous ﬂexibility. In conventional protein design the user must choose the conformational sampling (discrete rotamers and discrete backbones), either implicitly (by selecting the ﬁneness of the rotamer library) or explicitly (number of Monte Carlo (MC) runs, numerous parameters for simulated annealing (SA), etc.) Even in systematic search a (uniform) grid and its resolution must be chosen. The choice of sampling parameters is always tricky, since the program’s accuracy depends sensitively on these choices [4*]. Our algorithms, which use continuous rotamers and backbones, eliminate the need to choose a sampling, a grid, or their resolutions. Instead, the user speciﬁes continuous bounds on the conformational degrees of freedom (e.g., as unions of disjoint intervals). The algorithm then computes the optimal continuous solution within these bounds, freeing the user (and the algorithm) from any dependence on a sampling or its resolution.

We have shown the advantages of continuous ﬂexibility over discrete ﬂexibility [4*, 50, 10*, 11*, 1*]. In [4*] we used the iMinDEE algorithm to show that discrete rotamers do not accurately quantize conformation space and result in far from optimal design predictions. Importantly, continuous rotamers were able to ﬁnd conformations that were both lower in energy and signiﬁcantly more similar to native sequences (Fig. 5C). We also showed the beneﬁts of con- tinuous ﬂexibility in our particularly innovative protein design algorithms for ﬂexible backbones: BD [50]; brDEE [10*]; and the DEEPer algorithm [5*] (Figs. 1,6). In all these studies, continuous ﬂexibility resulted in different sequences from discrete ﬂexibility, with a difference in sequence of over 60% in some cases (Fig. 5A-B) [4*].

C. Ensembles. In every area of structural biology and molecular biophysics, modeling thermodynamic ensembles of structures (instead of single, frozen structures) has greatly increased ﬁdelity of calculated predictions, including binding, stability, and activity [109, 51, 112]. Protein design, in contrast, generally maintained that designing to a single structure should work, because it was assumed that enthalpy played a dominant role over binding, and entropy could thus be ignored [116, 60]. Increasingly, however, a wealth of studies on binding dynamics are proving that conformational entropy plays a determining role in binding, and that binding cannot be calculated without accounting for conformational entropy [109, 51, 98, 11*, 7*]. In contrast to other methods in the ﬁeld, OSPREY searches sequence space while modeling a thermodynamic ensemble of structures to predict afﬁnity. OSPREY’s efﬁciency is enabled by the breakthrough algorithm, iMinDEE/K*, that allows the sequence selection to be aware of the structure-based partition function during design, and to prune sequences and structures when they are proven to have a poor KD . We have shown that solutions found by OSPREY were missed by competing algorithms that eschewed continuous ﬂexibility or ensembles [3*, 1*, 8*, 2*]. Since these solutions were experimentally veriﬁed, and would not have been found without OSPREY, such failures constitute a challenge to and a critique of the frozen-structure design approach. D. Negative design. Most applications of CSPD focus on stabilizing a target protein fold or binding event (positive

Figure 1: Flexibility and ensembles in the OSPREY protein redesign suite. (A) The rigid DEE, minDEE, iMinDEE, BD, brDEE,

and DEEPer algorithms model different types of protein ﬂexibility. Blurring illustrates continuous ﬂexibility in each algorithm. The corresponding graphs show the conceptual ﬂexibility that each algorithm searches (represented by all backbone ﬂexibility on the x-axis and all side-chain ﬂexibility on the y-axis). (B) Each algorithm has been implemented in OSPREY. The input model includes a 3D structure of the protein to be redesigned, a deﬁnition of the sequence space, the allowed protein ﬂexibility (including the rotamer library), and a pairwise energy function. Then, according to the type of ﬂexibility allowed, OSPREY runs a speciﬁc pruning algorithm followed by the A search algorithm. The A output generates a ranking based on either the lowest-energy structure of each sequence, or an ensemble of structures computed by the K algorithm (Fig. 2A). To ﬁnd sequences that have a high afﬁnity for one ligand (positive design) while having a low afﬁnity for another (negative design), a ranking is calculated by the ratio of K scores. If desired, predicted mutants can be improved by ﬁnding stability-bolstering positions and redesigning those residues. Panels (C) and (D) illustrate, respectively, the concept of single structure design, and design based on ensembles of structures (PDB id: 3FQC).

Research Strategy

Page 88

Principal Investigator/Program Director (Last, first, middle): Donald, Bruce R.

design). When designing speciﬁcity for a single target, it is also important to prevent unwanted folds or binding events from occurring (negative design) [55, 2*]. A successful positive design merely requires ﬁnding at least one protein sequence with the desired properties. However, in negative design the protein design method must be conﬁdent that no off-target binding occurs. Therefore, negative design is much more sensitive to false negatives and requires a more thorough search of the conformation space. Thus, accuracy through provability, continuous ﬂexibility, and ensembles is essential in negative design to prevent off-target binding. We successfully used negative design to predict mutations in bacterial DHFR to prevent binding of a drug [2*] (Fig. 4) and to prevent binding of weakly- neutralizing antibodies to antibody probes [6*] (Fig. 8).

E. Resistance mutations. Negative design in combination with positive design enables the study of resistance mutations to drug targets. Although other groups have retrospectively predicted resistance mutations (e.g. [97, 114, 22, 59, 62, 111, 64, 31]) to our knowledge, ours are the ﬁrst prospective computational predictions of resistance mutations in a drug target. Most competing algorithms can only “look up” possible mutations from the library of what has been clinically observed previously. In contrast, OSPREY can predict the escape mutations in a protein target that will arise for a new inhibitor. This powerful technique provides a computational alternative to expensive wet-lab resistance selection experiments. OSPREY’s prosopective predictions will enable the possibility to overcome drug resistance early in the iterative drug discovery process.

F. Stabilizing distal mutations. It has been claimed that computational enzyme design is so primitive that additional distal mutations outside the active site can only be selected post hoc by purely experimental methods, such as directed evolution [96]. In contrast to this pessimism, OSPREY has been proven to design not only active site, but also distal mutations that improve the desired novel activity or switch in speciﬁcity (Fig. 3, [1*]). This provides a computational alternative to directed evolution and random mutagenesis screening.

G. OSPREY outcompetes expensive experimental techniques. Because of the high biomedical relevance of our designs, our experimental collaborators have used state-of-the art experimental techniques to ﬁnd mutants that perform the same biomedical role, in parallel to our computational approach. These experimental techniques were more expensive and less effective than our computational approach: (i) Our work on designing anti-HIV antibodies (Fig. 9), and designing molecular probes (“bait”) to pull down broadly neutralizing antibodies from donor sera of long- term non-progressors (Fig. 8), have proven more efﬁcacious than phage display or random mutagenesis [6*]. This is because of the remarkable accuracy of our algorithms and their ability to search a larger, and less-biased sequence space. (ii) The best CAL:CFTR PPI inhibitor we designed in [3*] was signiﬁcantly superior to the best peptide found in a peptide array (Fig. 7). (iii) Our top computationally-predicted mutations were better resistance mutations (e.g. higher Ki values, while maintaining enzyme efﬁciency) than those that arose in resistance selection experiments (Fig. 4 and Sec. 3.1.5).

3 Approach

3.1 Aim 1: Predicting Future Resistance Mutations in Protein Targets of Novel Drugs

Drug resistance has been observed for even the most reserved antibiotics, sometimes after only brief clinical

Figure 2: OSPREY uses advanced algorithms to model protein ﬂexibility, and ensembles. (A) The K algorithm computes a provably-good "-approximation [11*, 3*] to the binding constant using a small subset of the conformational space available to the bound and unbound complexes. (B) Under Rigid DEE, if a rigid rotamer ir always has a higher energy than another rotamer it, then ir can be pruned because it is not part of the lowest-energy conformation under a rigid model (rigid GMEC). (C) However, when small, continuous changes in the -angle space are allowed, ir could become a lower energy conformation (the contGMEC). (D-E) Modeling the continuous region around the modal value has important effects on sequence selection. (D) Using rigid rotamers can result in steric clashes (shown in red). (E) However, small changes in -angle space allowed by continuous rotamers can have profound effects on the energies of interacting rotamers, as shown in this cartoon. (F-H) The iMinDEE and DEEPer algorithms efﬁciently ﬁnd the minGMEC. (F) MinDEE enumerates protein conformations in order of their lower energy bound. (G) iMinDEE ﬁrst prunes conformations with an energy greater than I0 and ﬁnds the local minimum energy conformation (LMEC) for those conformations. (H) iMinDEE iterates once more, pruning conformations with an energy greater than I1 (the energy of the LMEC). During the second iteration iMinDEE is guaranteed to ﬁnd the GMEC. See [4*, 5*] for detailed description.

Research Strategy

Page 89

Principal Investigator/Program Director (Last, first, middle): Donald, Bruce R.

exposure, severely limiting the effective lifetime of these drugs. One of the most common resistance mechanisms is the accumulation of mutations in an enzyme target, creating an active site that can no longer bind the inhibitor yet maintains function. When these resistance mutations are discovered in the clinic, the mutants must be identiﬁed and studied, forcing the drug design process to start anew. To address this problem in preclinical drug discovery, resistance mutants are generated and studied in vitro with labor-intensive experiments. In contrast, it would be far more effective to predict resistance mutations in silico during the initial stages of drug discovery, thus enabling the design process to anticipate and overcome the resistance. In response to this need, we propose a protocol to computationally predict resistance mutations in protein targets, using algorithms for positive and negative structure- based protein design.

3.1.1DHFR and propargyl-linked antifolates. Dihydrofolate reductase (DHFR) has long been a target for antimi- crobial therapy. We have chosen DHFR from several organisms as a model to predict future resistance mutations against a novel class of antifolates developed by our long-time collaborator Dr. Amy Anderson. Speciﬁcally, Dr. Anderson has designed and synthesized several nanomolar inhibitors with potent antimicrobial activity against C. glabrata, C. albicans, S. aureus, S. pyogenes and B. anthracis [88, 82, 81, 27, 26, 80, 45].

3.1.2Research Design. We will computationally predict the resistance mutations that pathogenic organisms may evolve to evade a set of novel inhibitors (using the protocol in Sec. 3.1.5B-C). Predictions will be made for MRSA,

C. glabrata, and VRE, and tested in vitro using mutated enzymes (Ki, KD , kcat/KM ; protocol in Fig. 4 and [2*]) and knock-in pathogen strains. Since we will make predictions for a set of novel inhibitors (e.g. Fig. 4A) for each pathogen,

this will represent a challenge to determine the accuracy of our algorithms. Hence, our results will provide a way to address the following crucial need in preclinical drug discovery: For each novel drug proposed to treat MRSA, VRE, and C. glabrata, predict in silico, before clinical deployment, what new resistance mutations will evolve.

To conﬁrm our predictions, we will perform extensive biochemical and structural studies, including in vitro ex- periments with mutated enzymes, genetically modiﬁed (knock-in) pathogen cells, and X-ray crystallography (Fig. 4). In addition, we will perform resistance selection experiments to determine which of the predicted proteins arise in pathogens when stressed with our inhibitors. These will be performed as described in [46] by Dr. Amy Anderson.

We will also validate against known resistance mutations in DHFR, which are important in MRSA, malaria (P. falciparum), E. coli, L. casei, and P. carinii drug resistance. Although DHFR enzymes have similar backbone folds, sequence variation across species causes drugs to have very different inhibition proﬁles for the different DHFR en- zymes [80, 32, 58], and the DHFR resistance mutations that arise in each species are different [63]. Thus, predicting resistance across distinct species presents a challenge, because the method must be sensitive enough to capture and predict different resistance mutations despite a similar fold. Correct prediction of the species-dependent muta- tions will validate OSPREY’s generality.

3.1.3Predicting resistance mutations in other drug targets. While we will primarily focus on prospective predictions of DHFR resistance mutations, OSPREY is a general protein design method that can be applied to many drug targets. Therefore, we will also validate OSPREY on several enzymes with known resistance mutations [24], including enoyl acyl-carrier reductase and gyrase that confer resistance to M. tuberculosis infections. Then, we will validate against data on cancer resistance mutations, speciﬁcally resistance mutations in epidermal growth factor re- ceptor and ABL kinase [122, 74]. Finally, we will validate our methodology against resistance arising in viral enzymes, which can often mutate much faster than bacteria and fungi, speciﬁcally mutations in Inﬂuenza neuraminidase and mutations in HIV reverse transcriptase. In total we will retrospectively study 8 enzymes, 12 drugs, and we will rank mutations in the active site of each one according to resistance ﬁtness. We will compare the top computational predictions with 32 reported resistance mutations in the literature [24].

3.1.4Algorithmic improvements. Algorithmic and software improvements will be made to extend our prediction capabilities and improve accuracy. In general, the algorithmic improvements will beneﬁt all of the aims, but I have clustered the improvements near the most directly related aims. Although we have achieved strikingly high accu- racy in predicting resistance mutants (see Sec. 3.1.5), remarkably, the predictions used a ﬁxed-backbone model of DHFR (albeit with provable algorithms, continuous protein rotamers, continuous ligand rotamers, continuous ligand backbone ﬂexibility, and an ensemble-based prediction of binding afﬁnity based on partition functions). This must be improved, because resistance mutations can change the backbone of the mutant [29, 44]. We will build on our provably-accurate backbone-ﬂexible protein design algorithms as embodied in BD, BRDEE, and DEEPer (Figs. 1,6). These algorithms are guaranteed to compute the GMEC, and have been demonstrated to improve the prediction energies and the native structure recovery in backbone ﬂexible design [4*, 12*, 1*, 5*].

OSPREY currently can only model static, explicit waters and does not account for the repositioning of waters upon the introduction of mutations. However, water-mediated interactions between a protein and an inhibitor can have an important role in drug resistance [47]. We will model explicit waters continuously to allow for continuous placement of introduced waters, thereby freeing the designer from the necessity of choosing a grid or sampling resolution that forces the waters to be on speciﬁc, discrete positions with respect to a grid or a rotamer [65, 61]. Also,

Research Strategy

Page 90

Principal Investigator/Program Director (Last, first, middle): Donald, Bruce R.

water molecules will be able to appropriately ﬁll “voids” that are created through rotamer movements or amino acid mutations.

OSPREY uses thermodynamic ensembles during the design search to predict afﬁnity and activity, instead of de- signing to single structures. The successful, experimentally-validated design predictions of OSPREY were missed by single-structure approaches (as we showed in [3*, 1*, 8*, 2*]). Hence, the ensemble-based approach captures some of the dynamics and conformational heterogeneity of DHFR and other enzymes, increasing the accuracy of our predictions. However, there exists a wealth of data on the dynamics, ﬂexibility, and catalytic cycle of DHFR [28, 57]. None of this is currently input to OSPREY, which makes inferences about dynamics and ensembles de novo, from ﬁrst principles. Since conformational rearrangements are important consequences of resistance mutations, incorporating this dynamic information into the design algorithm will improve the accuracy of predictions. We especially anticipate that the accuracy of loop rearrangements will improve when we model what is already known about the dynamics. This process can be viewed as incorporating priors about dynamics and structures when the OSPREY ensembles are constructed. With priors from experimental dynamics measurements, the posterior predictions of ensembles will be more realistic.

3.1.5Preliminary Results

A.Enzyme redesign. A key requirement to designing successful resistance mutations to enzyme inhibitors is the redesign of the enzyme to maintain (or even to improve) the native (WT) catalytic activity. Thus, the general capability to redesign enzymes is necessary for resistance prediction. In the past grant period OSPREY successfully redesigned the Phenylalanine Adenylation domain (PheA) of gramicidin synthetase A (GrsA) ﬁve times, to adenylate ﬁve amino acids other than Phe [1*] (Fig. 3). We successfully redesigned PheA to obtain a 2168-fold switch in speciﬁcity from L- Phe to L-Leu (Fig. 3A) by combination of a double mutation in the substrate binding pocket (Fig. 3D,B,A) and several distal bolstering mutations (Fig. 3E,A) [1*]. We also redesigned PheA to adenylate a set of non-cognate, charged substrates (Arg, Asp, Tyr, and Lys; Fig. 3C), and redesigned inositol phosphate multikinase speciﬁcity [19*].

B.Prediction of double-mutant resistance mutations against the D26M drug. In [2*] we described the successful in silico prediction of resistance mutations to a novel antifolate inhibitor of MRSA DHFR. OSPREY ranked 1173 double-mutants of MRSA DHFR by the ratio (Fig. 1B,D) of positive to negative design: positive design to maintain catalytic function, and negative design to abrogate binding of the lead inhibitor. Four of the top ten predicted

sequences were generated through mutagenesis and tested in vitro for enzyme efﬁciency (kcat =KM ) and inhibition (Ki). The enzyme efﬁciencies of the predicted resistance mutants were reduced, but all were in the same range as for clinically-observed DHFR resistance mutations in a variety of organisms [2*, 43, 100, 76]. Enzyme inhibition assays showed that three of the four highly-ranked predicted mutants are active yet display lower afﬁnity (18-, 9- and 13-fold) for the inhibitor (Fig. 4B). A crystal structure of the top-ranked mutant enzyme validated the predicted conformations of the mutated residues and the structural basis of the loss of potency (Fig. 4C).

C.Prediction of single nucleotide resistance mutations. Using feedback (see Sec. 2.2.A) from the kinetic data and crystal structure in [2*], OSPREY predicted single nucleotide polymorphism mutations to the active site of DHFR for resistance to three antibiotics (Fig. 4A). The top four mutants were tested in vitro and showed up to 68-

fold gain in drug resistance, with little to no loss (1.0- to 1.2-fold) in enzyme efﬁciency kcat =KM (Fig. 4B). Our top two resistance mutations towards the effective MRSA inhibitor U6 (Fig. 4A), V31L and V31I showed, respectively, a

Figure 3: Computational design of the Phe adenylation domain (PheA) of gramicidin S synthetase A to adenylate Leu or Lys

instead of Phe. (A) Relative substrate speciﬁcity for Phe (left) and Leu (right) of wild-type (WT) and redesigned PheA [1*]. The speciﬁcity of all enzymes for the two substrates are normalized relative to the enzyme with the highest speciﬁcity for that substrate. The designed enzyme with the highest activity towards noncognate substrate, L-Leu, showed a 2168-fold switch in speciﬁcity and 1/6 of the WT:Phe protein/substrate activity. (B) Steady-state kinetics curve for the T287L/A301G design with Leu as substrate [1*]. (C) PheA was also redesigned to adenylate 4 charged substrates (Asp, Lys, Arg, Glu) for which the WT has no activity. Example: Kinetics curve (black) for the T278D/A301G design with Lys as substrate [1*]. For comparison, the kinetics curve for the WT enzyme (with Lys as substrate) is shown in blue. (D) K-predicted structure of the lowest-energy conformation of the T278L/A301G design (see A,B) with Leu as substrate (CPK, center). (E) OSPREY predicted mutations to residues distal to the active site (dark blue, labeled), and they successfully enhanced speciﬁcity for Leu (see A, right).

Research Strategy

Page 91

Principal Investigator/Program Director (Last, first, middle): Donald, Bruce R.

68-fold and a 46-fold gain in resistance (Ki increase, denoted "Ki, implies #inhibition), with less than 1.2-fold loss in enzymatic activity (kcat=KM ). In contrast, the two mutants discovered in resistance selection experiments, F98I and F98Y, showed only a 20- and a 4.3-fold "Ki, respectively [46]. Thus, the computationally designed mutants showed much better resistance against U6 at the enzyme level than the experimentally-selected mutants.

3.1.6Key New Experiments. OSPREY has designed proteins that are better escape mutants at the enzyme level

(Ki, kcat, KM , TM , crystal structures, codon propensity, and transversion/transition), than those reported clinically or arising previously in resistance selection experiments [46]. Therefore, to test our hypothesis linking protein design to antimicrobial resistance, we will (a) improve OSPREY’s modeling and expand our MRSA predictions (see Sec. 3.1.4);

(b) perform resistance predictions in VRE and C. glabrata, to understand whether the computationally predicted mutants match the resistance selection mutants in these organisms; (c) create modiﬁed microbes (MRSA, VRE, and C. glabrata) possesing the mutant enzymes and assess their ﬁtness; (d) perform resistance selection experiments on longer time scales in MRSA, VRE, and C. glabrata to increase the probability of our predicted mutants to appear; and (e) perform deep sequencing of Dr. Vance Fowler’s wordwide-repository of clinically well-characterized MRSA isolates, to see which strains contain our predicted mutations. Our results will elucidate which biophysical properties are the biological determinants of escape mutations in MRSA, VRE, and C. glabrata DHFR, and whether these properties can be reliably predicted by a combination of positive and negative protein design.

The ability of our novel OSPREY-based algorithms to predict resistance mutations will be incorporated in a lead design strategy applicable against any target that is susceptible to mutational resistance. Finally, the remarkable accuracy of our algorithms suggests the tantalizing possibility to analogously predict future mutational resistance to designed antibodies (Aim 3), although obviously by predicting afﬁnities rather than activity!

3.2 Aim 2: Peptide-like Inhibitors of Protein:Protein Interactions (PPIs)

We propose to design peptide-like inhibitors of PPIs. The methodologies that have been developed to study PPIs can be divided into sequence-[30, 106] and structure-based [102, 69, 23, 67, 93, 95, 55, 99] methods. Sequence- based methods require a large amount of sequence and binding information for the protein family and do not provide direct structural information on the modeled interaction. Among the previous structure-based alternatives, most focus on ﬁnding the single GMEC conformation, although a few studies have started designing to a set of different backbone conformations [102, 25, 103]. Only the work of Tidor and co-workers [23] utilizes provable techniques. None use both provable techniques and protein ensembles, and all lack the ability to use continuous rotamers (Sec. 2.2.B), which are all features of the K* algorithm. As a result, K* complements existing approaches while addressing some of their methodological limitations.

3.2.1Cystic Fibrosis (CF). CF is caused by mutations in the cystic ﬁbrosis transmembrane regulator protein (CFTR), which result in a build-up of mucus in the lungs and digestive tract. Although the most prevalent genetic cause of CF (ΔF508, a deletion of a phenylalanine residue at position 508 from CFTR) was identiﬁed over 20 years

Figure 4: C. Computational drug-resistance prediction in Methicillin-resistant Staphylococcus aureus (MRSA). OSPREY pre- dicted mutations to MRSA dihydrofolate reductase (DHFR) that would confer the enzyme with resistance to a new family of antifolates (A), containing a propargyl-link between the pyrimidine and biaryl moieties. Resistance mutations were predicted towards three drugs from this family: D26M, U5, and U6. (B) Experimentally measured loss of inhibitor afﬁnity vs. loss in KM (log scale) of the top predicted resistance mutations [2*]. Fold Ki increase is denoted "Ki, and "Ki implies #inhibition. Two rounds of computational predictions coupled to experimental testing were conducted. In the ﬁrst round, double mutations in DHFR’s active site were ranked according to predicted resistance to D26M, and 4 of the top ten mutants were tested experimentally. In the second round, single nucleotide polymorphism (SNP) resistance mutations were predicted against the three drugs, using feedback from the ﬁrst round. The crystal structures of the top mutants from each round was determined. kcat was also measured, and the median fold loss in enzyme efﬁciency (kcat =KM ) for the mutants was only 1.19. Clinically-observed DHFR resistance mutations have a larger efﬁciency loss, of up to 200-fold [80, 32, 58]. Several OSPREY- predicted resistance mutants confer a 2-fold greater "Ki, and therefore drop in potency, than the notorious nosocomial F98Y mutation (which confers resistance against Trimethoprim in MRSA DHFR [38]). (C) Superposition of our crystal structures of wild-type (green) and V31Y/F92I mutant (magenta) bound to D26M [2*]. As predicted by OSPREY/K, the crystal structure of the double mutant showed decreased contacts with the drug, explaining the loss in afﬁnity and validating OSPREY’s structural and binding predictions.

Research Strategy

Page 92

Principal Investigator/Program Director (Last, first, middle): Donald, Bruce R.

ago, the ﬁrst drugs targeting any CFTR defect have only recently been approved (2012) and generally target rare peripheral mutations rather than ΔF508. While some ΔF508-CFTR chloride (Cl- ) channel activity can be recovered using these drugs [21, 92, 53, 34], none of them prevent the degradation of ΔF508-CFTR to stabilize it in the apical membrane. Thus, their effectiveness is severely reduced.

3.2.2Research Design. We propose to design PPI inhibitors using the K, BD, and DEEPer algorithms (Figs. 1, 2) to optimize afﬁnity and speciﬁcity. We will develop new extensions to the design algorithms, enabling the software to robustly design novel PPIs. Key new experiments include: explicit negative design for speciﬁcity, and design of peptidomimetic inhibitors. Designed inhibitors—both peptide and peptidomimetic—will be validated by binding and in-cell assays (Fig. 7C-D) by our collaborator Dean Madden, who developed and validated CAL as a therapeutic target [36]. We will concentrate on the network of protein interactions in the cystic ﬁbrosis associated ligand (CAL): cystic ﬁbrosis transmembrane conductance regulator (CFTR) system (Fig. 7A), with the focused translational goal of designing excellent drug leads that can stabilize CFTR in the apical membrane, and rescue mutant CFTR activity in homozygous ΔF508 human epithelial upper airway cells. First, we will design peptide inhibitors of PPIs, focusing on disrupting CAL:CFTR binding (see Section 3.2.4). Negative design will be used to create peptides that are speciﬁc for the PDZ domain of CAL but do not bind well to the NHERF1 PDZ domain and other “beneﬁcial” PDZ domains that insert CFTR into the apical membrane (Fig. 7A). The negative design will use the tools we developed and demonstrated for modeling DHFR (Aim 1, [2*]); for antibody probes (Sec. 3.3.1 ); and for peptides [3*, on page 7 therein]. Next, OSPREY will then use the successful peptide scaffolds to design peptidomimetic inhibitors. Peptidomimetics are less likely to be degraded quickly and can be more easily optimized for delivery. The new inhibitors will incorporate D-peptides, peptoids, cyclic peptides, and non-natural amino acids (similar to [94]). We have experience with non-natural, D-peptides, and cyclic peptides from our NRPS research (Sec. 3.2.4), where we have studied the kinetics and speciﬁcity of their activation and incorporation into biosynthetic pathways [104, 1*]. OSPREY can already outcompete experimental techniques (Secs. 3.1.5,3.2.4), but the difﬁculty of screening non- natural amino acids in wet-lab experiments provides an additional, clear advantage for our CSPD method. Since our software will be general, we will also apply it to design PPI inhibitors for Dr. Zhou’s Rev1 complexes (see Letter).

3.2.3Algorithm Improvements. All the algorithmic improvements in Aim 1 (Sec. 3.1.4) will improve the accuracy of our designs for Aim 2. In addition to the challenges of active site design, a key difﬁculty of PPI inhibitor design is modeling the large protein surface area that must be blocked by the designed inhibitor. Normally, an active site is

much smaller than a PPI interface so a PPI design must model a much larger protein area. To address this, we will improve the speed and efﬁciency of OSPREY, speciﬁcally the DEE, and A modules (Fig. 1). These crucial algorithmic

Figure 5: Continuous ﬂexibility has a dramatic impact on sequence selection, and cannot be well-approximated by discrete

ﬂexibility. Results are shown for 25 of the 69 protein cores that we redesigned in [4*]. Each PDB structure (shown on the x-axis in (A), (B), and (D)) was designed for stability using rigid DEE and iMinDEE/MinDEE. (A) Fraction of the redesigned residues that had different amino acid (AA) types between the rigid GMEC and the minGMEC [4*]. Designs used the standard rotamer library, RL0 [83]. (B) Expanding the rotamer library still results in different sequences than using continuous rotamers. An expanded rotamer library, RL1, was used as input to rigid DEE, and the sequences found were compared vs. the standard rotamer library (RL0) using minDEE/iMinDEE. (A-B) show the fraction of the amino acids that were different between the minGMEC of MinDEE using RL0 and, the rigid GMEC of RL1 [4*]. (C)

Native sequence recovery results show that continuous ﬂexibility yields sequences that are more similar to biological proteins.

The panel shows a summary of amino acid side chains containing more than one ﬂexible dihedral angle that were not recovered by rigid DEE (pie chart above) and iMinDEE (pie chart below) [4*]. For comparison, the recovered amino acids with more than one ﬂexible dihedral angle are shown in grey. This improvement in accuracy matches the gains seen when incorporating sophisticated energy terms into the design [4*]. (D) The iMinDEE algorithm exponentially reduces the size of the conformation search space. The total conformation space for the designed protein cores are plotted by coding in blue+orange+yellow. The conformation space after applying the MinDEE algorithm is coded in blue+orange. The conformation space after applying the iMinDEE algorithm is coded in blue. In all cases iMinDEE prunes much more than MinDEE, reducing the space that A must search, while still guaranteeing to obtain the same optimal result as MinDEE [4*].

Research Strategy

Page 93

Principal Investigator/Program Director (Last, first, middle): Donald, Bruce R.

enhancements will enable more complicated and realistic designs.

In [4*] we improved the efﬁciency of MinDEE, which signiﬁcantly reduces the number of rotamers and conforma- tions that must be considered by A* (in a typical case, the search space is 5.5 billion times smaller [4*]). In that same paper, we identiﬁed that further speed improvements could be achieved by improving the pairwise energy bounds used during DEE pruning. Not only will improving the bounds increase DEE pruning, but the number of conforma- tions enumerated by A* will also be dramatically reduced. Loose bounds arise because each bound is calculated by minimizing a pair of rotamers in the absence of other side-chains and only the lowest energy found during the mini- mization is kept. Hence, the pair of rotamers can minimize into areas that might be occupied by other side-chains not present in the bound calculation. To remedy this, individual rotamers will be combined into single rotamers, similar to the idea of super-rotamers [52, 84], forcing the combined rotamers to always minimize in the presence of each other. Additionally, instead of storing only the lowest energy from each bound calculation, we will create a piecewise-linear function to bound each continuous rotamer voxel. These functions more fully map the bound energy landscape and can essentially be summed to get a much more accurate bound on the full conformation energy. Preliminary tests of these improvements conﬁrm that the number of A* conformations that must be enumerated can be reduced by up to 3700-fold.

OSPREY’s A* module, like the standard A* algorithm [75], uses a search tree where every level of the tree represents a different residue position of the protein. The ordering of the residues within the tree does not affect the ﬁnal design prediction, but does impact the A* runtime. Now, if the A* algorithm could choose an optimal (uniform) tree order, preliminary tests show that up to a 2518-fold speedup in runtime is obtained. Why? Because not all rotamers contribute equally to the inaccuracies of the A* energy bounds, so it is advantageous to quickly combine those rotamer pairs with the most impact on the energy bounds. The A* algorithm runtime can be improved even more, if we allow each tree branch to have a different (i.e., non-uniform ) residue ordering. Hence, we will create an A* algorithm that can dynamically reorder each branch of its search tree to quickly ﬁnd the optimal conformation. Our preliminary implementation of this new dynamic A* algorithm reduces the runtime by more than 12 orders of magnitude for a large design problem on E. coli thioredoxin (PDB id: 2TRX)!

We will also develop an algorithm that can supplant A to exploit the locality of residue interactions in proteins. Since interaction energy decreases as a function of distance between the two residues, the number of residue pairs with signiﬁcant interaction energy can be less than all pairs of ﬂexible residues. In such cases the design system can be approximated using a sparse residue interaction graph. We will create a novel dynamic programming algorithm that uses the branch-decomposition of the sparse residue interaction graph to ﬁnd its GMEC (similar to the tree- decomposition algorithms in [77, 120]). Preliminary results show that using sparse residue interaction graphs, we can ﬁnd the GMECs in cases when traditional A runs out of memory. Furthermore, our new algorithm is faster than Sparse A (modiﬁed version of A for sparse residue interaction graphs) for mid-sized design problems (with order of 1013 unpruned conformations) and obtained a 778-fold speedup on a large design problem on the PDZ domain of our protein CAL (PDB id: 2LOB). We will further extend our new algorithm to generate a gap-free list of energy conformations and to model additional side-chain and backbone ﬂexibility, a la´ [4*, 5*].

3.2.4Preliminary results.

A. Design of small-molecule inhibitors of PPIs. We have designed PPI inhibitors important to combat leukemia.

We undertook this work to address MSFD suggestions on our 2007 1R01 application, but published the study before the “A2” funding began. In [54] we used OSPREY/K to design small-molecule allosteric inhibitors of an oncogenic PPI (dimeric transcription factor Core Binding factor) central in acute myelomonocytic leukemia. The inhibitors were

Figure 6: The DEEPer (Dead-End Elimination with Perturbations) algorithm. (A) DEEPer combines the continuous sidechain motion of iMinDEE with novel motions called perturbations, each of which can move the backbones of several residues [5*]. Perturbations can be continuous, allowing DEEPer to model continuous sidechain and backbone motion simultaneously. We implemented seven types of perturbations based on motions commonly observed in crystallographic alternates: (B) backrub motions; (C) shear motions; (D) loop closure adjustments; (E) secondary structure adjustments; (F) partial structure switches; (G) changes to the pucker of a proline ring; and

(H)full structure switches. Each perturbation is from red backbone and orange sidechains to blue backbone and purple sidechains. Black balls denote the boundaries of the backbone region affected by the perturbation. (I)-(J) DEEPer can recover alternate crystallographic states [5*]. Designs are shown for the proteins AmiD (Panel I; PDB id: 2BGX) and sphericase (Panel J; PDB id: 2IXT). The starting structure is shown in black/gray; the complete searched ensemble in purple; the low-energy ensemble in blue; the GMEC in pink; and green balls demarcate ﬂexible-backbone regions.

Research Strategy

Page 94

Principal Investigator/Program Director (Last, first, middle): Donald, Bruce R.

synthesized and validated by FRET, ELISA, and NMR, and slowed proliferation in human cancer cell lines.

B. Design of peptide-based inhibitors of PPIs. [3*] reports an end-to-end study for the CAL:CFTR system, going all the way from theory, to new algorithms, to computational predictions, to experimental testing and in cell validation of novel, effective CAL inhibitors (Fig. 7C-D). Our results gave surprising insight into the speciﬁcity of PDZ domains. It is generally believed that the speciﬁcity of PDZ domains is mostly encoded by the sequence motif each PDZ domain recognizes. However, our binding experiments showed that all 32 peptides we synthesized and 1734 sequences in our peptide array that match the CAL motif have a wide range of binding afﬁnities. Importantly, our predictions matched the binding afﬁnities. Therefore, K* was shown to perform the difﬁcult task of differentiating the afﬁnities of peptides that share the CAL motif, rather than the much simpler task of merely separating motif from non-motif sequences. Rescue of mutant CFTR activity in homozygous ΔF508 human epithelial upper airway cells is shown in Fig. 7D. The results in [3*] have important implications for both structural design methodology and CF therapeutic treatment. The computational inhibitor design had remarkable accuracy (Fig. 7); all of our predicted peptide inhibitors bound well to the target. The designs focused on a speciﬁc PDZ domain interaction, which is a very important problem in its own right since these domains are ubiquitous in humans. Thus, we showed that by using our ensemble-based design algorithm, PDZ domain interactions (in particular) and -sheet interactions (in general) can be accurately modeled, predicted, and designed.

3.3Aim 3: Redesign of Broadly Neutralizing Antibodies and Nanobodies against HIV Env Pro- teins and the Design of Antibody Probes

HIV-1 is an envelope (Env) virus that utilizes a trimeric viral spike (composed of glycoproteins gp120 and gp41) to gain entry into host cells, a process initiated by gp120 binding to the cellular receptor CD4. Since 2010 [119], a large set of broadly neutralizing HIV antibodies (BNAbs), called VRC01-like antibodies (“VAbs”), have been isolated from donor sera using probes based on resurfaced and stabilized (“RSC”) gp120 monomers. Some of the antibodies have been deﬁned structurally in complex with gp120, and exhibit precise targeting of the gp120 CD4 binding site (CD4bs) [119, 123]. The VRC01 antibody, for which the class was named, neutralizes 90% of circulating HIV- 1 strains, of which 70% are neutralized with high potency [119], and is thus an excellent candidate for passive immunization studies. While the VAbs are very promising, human dimeric antibodies have certain drawbacks that can be overcome by using monomeric antibody derivatives, called nanobodies. The advantages of nanobodies over conventional dimeric antibodies include: (i) they can be produced cheaply and recombinantly; (ii) they are very stable, requiring no refrigeration; (iii) they reversibly fold, making them easier to store; (iv) they can vitiate resistance conferred by the BNAb light chain; and (v) multiple nanobodies can be conjugated (linked) together to optimize avidity, by matching the trimeric antigen.

Therefore, we propose to use OSPREY in a three-pronged approach to create broadly neutralizing antibodies and nanobodies for HIV. First, we will use OSPREY to design improved probes that can accurately distinguish between broadly-neutralizing and weakly-neutralizing antibodies. We will also redesign the interface of BNAbs for improved breadth and potency towards the development of a pan-neutralizing antibody. Finally, we will design nanobodies based on existing VAbs and VAbs we ﬁnd or design. Our designed antibodies and nanobodies will evolve into a

Figure 7: Computational Design of a PDZ Domain Peptide Inhibitor that Rescues CFTR Activity. (A) Model of the CFTR trafﬁcking pathway. CFTR is released from the Golgi complex and trafﬁcked by either NHERF1 to the membrane for insertion or by CAL to the lysosome for degradation. Red ‘X’s denote the CAL-CFTR interaction to disrupt with the designed peptide inhibitors. (B) OSPREY model of the top 100 conformation ensemble for the designed peptide with tightest binding to CAL. (C) Fluorescence polarization binding afﬁnity measurements for designed peptide inhibitors [3*]. Peptides shown in green had a higher binding afﬁnity than the best previously known hexamer. Horizontal line represents average G for the predicted peptides. Also shown are the binding afﬁnity for CAL to its highest afﬁnity natural ligand (blue diamond) and the wild-type CFTR C-terminus (red square). (D) The ΔF508-CFTR-speciﬁc chloride (Cl) ﬂux is shown for a control peptide, a reference peptide (iCAL35) and the tightest-binding designed peptide (kCAL01). Rescue by the previously best hexamer iCAL35, discovered by high-througput peptide arrays, was not signiﬁcant (N.S.) but rescue by designed peptide kCAL01, which binds 7-fold tighter to CAL, was highly signiﬁcant [3*]. kCAL01 shows comparable rescue of Cl ﬂux as other CFTR drugs (12% rescue) and could be used in combination with those drugs for additive rescue.

Research Strategy

Page 95

Principal Investigator/Program Director (Last, first, middle): Donald, Bruce R.

powerful agent for passive immunization, and can thus play a vital role in disease prevention and prophylaxis.

3.3.1Probe Design. We will design improved probes to isolate BNAbs, building on our success to date (Fig. 8

and [6*]). The designs in Fig. 8 were made using MinDEE, a GMEC-based method. Greatly improved designs will be made by upgrading to K* and DEEPer. Unfortunately, many CD4bs antibodies (Abs) have poor neutralization properties [119], likely due to structural incompatibility of such antibodies with the native viral spike. To ensure our probes can discriminate between non-neutralizing and neutralizing antibodies, we will perform positive and negative design to maintain binding to a range of BNAbs (positive: the entire VAb family), and ignore weaker Abs (negative: b13, m18, F105, etc.).

3.3.2Antibody Interface Design for Potency and Breadth. Starting with the structures of the most potent VAb:gp120 complexes available, we will use OSPREY to redesign the interface between the antibody and gp120 (See Fig. 9). Improving the interface of VAb:gp120 structures will improve potency and has been suggested to improve breadth as well [42]. Breadth will further be targeted by speciﬁcally modeling known resistant gp120 sequences and designing VAb mutations to overcome resistance. Improved antibody designs will be made using OSPREY, and the top-ranked designs will be experimentally validated. Experimental results will be used to improve the input model as described in Sec. 2.2 and [4*]. We will continue the cycle of prediction, algorithm development, and experimental testing, to repeatedly improve the algorithms, models, and most importantly, the antibody designs.

3.3.3Designing Nanobodies for Passive HIV Immunization. VAb heavy chains are structurally almost identical to the single-chain llama antibody scaffold. Moreover, structural analysis suggests that gp120 resistance mutations target the Ab light chain, and would not be effective if the light chain was not present. This supports designing nanobodies based on the heavy chain of VAbs. We will delete the constant regions of the VAb heavy chain, and the light chain entirely, to obtain a human (VH) construct structurally homologous () to llama (VHH) domains. Since isolated VH domains of VAbs are insoluble or unstable when constructed and expressed without the rest of the human antibody, the VH ( VHH) domain must be resurfaced for solubility and stability. In addition, the VH:gp120 interface will be redesigned as described in Sec 3.3.2. Human antibodies are bivalent, while gp120 is a trimer. This 2:3 mismatch limits avidity. Avidity will be improved by designing peptide linkers connecting 3 designed nanobodies together. The 3 linked nanobodies will then be conjugated to a human antibody FC region to generate a human immune response.

3.3.4Algorithmic Improvements. Robust structure-based antibody redesign requires accurate PPI design [79] (See Aim 2). In addition, the algorithm must address antibody-speciﬁc problems: modeling canonical CDR loops [101], considering glycosylation when predicting antibody binding, and ensuring that the redesigned antibodies remain hu- man (i.e., recognized by the immune system as self). Additional challenges are presented if the redesigns include regions around the antibody heavy/light chain interface.

Loops are the most ﬂexible antibody regions and it is thus essential to model their ﬂexibility for antibody recogni- tion. We have recently developed a remarkably accurate method for predicting loop structures [107]. We will use this method combined with our CDR loop library to create realistic loops to use during the design search. When adding loop ﬂexibility the conformational search space grows immensely, but additional pruning will be possible through mul- tistate design across loops from different backbones [50, 10*, 5*]. That is, similar to rotamer pruning, some loop backbones will be able to prune other loop backbones from the conformational search.

3.3.5Experimental Validation. To design and evaluate our improved probes, antibodies, and nanobodies, we will

Figure 8: Design of Epitope-Speciﬁc Probes for Sera Analysis and Antibody Isolation. OSPREY predicted mutations to gp120 that would maintain binding for VRC01 while knocking out binding for the antibody b13. (A) Computational mutant predictions. The OSPREY- predicted binding of VRC01 (x-axis) and b13 (y-axis) to a set of RSC3 (resurfaced stabilized gp20 core) mutants is compared. Mutants with high ranks for VRC01 and low ranks with b13 (e.g., red rectangle) were predicted to bind preferentially to VRC01 but not to b13. (B) Structural models of P369H with VRC01 and b13. The P369H mutation (orange) was found to be more peripheral to VRC01 binding but more central in the b13 binding site, creating steric clashes with b13. (C) ELISA binding for selected mutants. The top-scoring mutants P369H, D368W, and D368I exhibited the desired binding proﬁles, with binding to b13 substantially decreased or abolished [6*] (in contrast to wildtype RSC3, which binds both VRC01 and b13 well).

Research Strategy

Page 96

Principal Investigator/Program Director (Last, first, middle): Donald, Bruce R.

Figure 9: HIV Antibody Redesign, Nanobody Design, and Validation. Two CD4-binding-site-directed broadly-neutralizing antibodies (BNAbs), VRC01 and VRC07 (with the G54W mutation), bind HIV-1 envelope (Env) glycoprotein 120 (gp120). We used OSPREY to redesign the VRC07/G54W:gp120 protein:protein interface (PPI) to improve the antibody’s potency. (A) K predicted the mutation I30R to have improved contact with the gp120 backbone (blue dots). (B) Pseudovirus neutralization assays were performed on the redesigned antibody containing I30R, vs. a panel of 7 strains (plus SIV control). Overall, a 1.5-fold improvement in potency (IC50) was observed.

(C)Starting with the VH domain of BNAb VRC01, stable, soluble, neutralizing nanobodies (VH H) were computationally designed with OSPREY. The best nanobody, VHH01, required 12 mutations (table, and magenta sticks) to VH . The nanobody structure (ribbon) is shown in complex with gp120 (surface). (D-E) Nanobodies were created and tested for expression (D), gp120 binding, and neutralization

(E).(D) Expression of the designed nanobody (VHH01) and llama antibody J3 [85]. (E) SPR binding measurements for VHH01 binding to full-length HXB2 gp120 and neutralization of two Env-pseudotyped viruses in TZM-bl cells using luciferase reporter gene expression. Predictions by PI’s Lab/OSPREY; Validation experiments done at VRC.

work with Peter Kwong (SBS, VRC, NIAID), John Mascola (BSL-3, VRC, NIAID), and Ivelin Georgiev (SBIS, VRC, NIAID). Mutant antibody proteins and probes will be produced by transient transfection in mammalian cells for human antibodies (e.g., [123, 6*]) and E. coli cells for nanobodies. Binding afﬁnities for the selected antibodies to the set of VAb-resistant and VAb-sensitive gp120 sequences will be measured using surface-plasmon resonance (SPR) and ELISA assays. Neutralization assays will be performed for the mutants that exhibit binding to all or most of the VAb- resistant sequences. Binding and neutralization assays will be performed by John Mascola’s lab at the VRC. Crystal structures of pan-neutralizing designed VAb mutants in complex with gp120 will be obtained by Peter Kwong’s lab at the VRC. The experimental data will be used as feedback for the subsequent computational design cycles. Probes will be tested by measuring binding to broadly- and weakly-neutralizing antibodies. Successful probes will be used to isolate new antibodies from donor sera.

3.3.6Preliminary Results. We have conducted several designs on the VAb:gp120 system that demonstrate OSPREY’s ability to design the gp120 binding interface (probe design, Fig. 8), VAb binding interface (improved

potency, Fig. 9A-B), and VAb VH surface (nanobody design, Fig. 9C-E). Our combined positive/negative design approach successfully designed probes for isolation of BNAbs from human donor sera that are similar to VAbs while avoiding weakly neutralizing antibodies such as b13 [6*]. OSPREY was able to improve potency of an already highly- optimized VAb, VRC07, containing the G54W mutation discovered by Bjorkman¨ and co-workers [42]. Speciﬁcally, the I30R mutation increased the potency of the antibody as measured by pseudovirus neutralization by 1.5-fold on a panel of 7 HIV strains. Probe, antibody, and nanobody preliminary results are shown in Figs. 8–9.

3.3.7Fighting Resistance. The computational designs for improved probes, antibody interfaces, and nanobodies naturally build upon one another to enable the design of antibodies and potentially nanobodies to enable low-cost pas- sive immunization, optimize avidity, and circumvent HIV resistance. The key to avoid HIV resistance lies in OSPREY’s ability to combine positive and negative design to optimize antibody binding and anticipate new resistance mutations on gp120 that could block nanobody binding. We will apply computational negative design (Secs. 2.2, 3.1.5, 3.3.6) to anticipate new resistance mutations on gp120 by predicting amino acid substitutions that block nanobody binding (negative design) while maintaining CD4 binding (positive design). This positive/negative design technique has had previous success with our probe design (Sec. 3.3.1). Hence, it is likely to be successful when applied “inversely” to nanobody design. Our computational-experimental approach is unique in addressing cost, avidity, and mutational re- sistance together in the same design scheme. While traditional approaches combat antigenic mutations after clinical exposure, we are the ﬁrst to incorporate strategies early in the antibody design process to predict and overcome the effects of possible resistance mutations. The successfully designed nanobodies will be cheap to manufacture and will subvert future generations of evolved resistance. The cheap, unrefrigerated, easy-to-store broadly-neutralizing nanobodies will be targeted speciﬁcally to bind the HIV Env proteins. This will likely reduce the breakthrough infection rate [113] and serious side-effects of conventional antiretroviral therapy, while combating the evolving HIV resistance mutations.

Research Strategy

Page 97

Principal Investigator/Program Director (Last, first, middle): Donald, Bruce R.

Progress Report Publication List

Papers published in this grant period

References to papers by the PI and his students, supported by this R01 in this grant period, are marked with a ‘*’, e.g., [3*].

Note: In Computational Biology, certain conferences (RECOMB, ISMB, etc) are highly selective and rigorously refereed, often by 3 reviewers plus the conference chairs. Conference papers are published not as one-page abstracts, but as 10-15 page full papers (in 10pt double-column format). Some of these conferences are Medline- indexed. For these reasons, conference papers are considered primary publications in the ﬁeld. Acceptance rates for the conference papers listed below lie between 9% and 26%.

For RECOMB papers, I sometimes publish a more extensive version, later, in a journal. In this case I have combined the RECOMB and journal publication citations.

[1*] C.-Y. Chen, I. Georgiev, A. Anderson, and B. R. Donald. Computational structure-based redesign of enzyme activity. Proceedings of the National Academy of Sciences, U.S.A. (PNAS), 106(10):3764–3769, 2009. PMC id: PMC2645347 .

[2*] Kathleen M. Frey, Ivelin Georgiev, Bruce R. Donald, and Amy C. Anderson. Predicting resistance muta- tions using protein design algorithms. Proceedings of the National Academy of Sciences, U.S.A. (PNAS), 107(31):13707–13712, 2010. PMC id: PMC2911145.

[3*] Kyle E. Roberts, Patrick R. Cushing, Prisca Boisguerin, Dean R. Madden, and Bruce R. Donald. Compu- tational design of a PDZ domain peptide inhibitor that rescues CFTR activity. PLoS Computational Biology, 8(4):e1002477, 2012. PMC id: PMC3257257

[4*] P. Gainza, K. Roberts, and B. R. Donald. Protein design using continuous rotamers. PLoS Computational Biology, 8(1):e1002335 (15 pages), January 2012. PMC id: PMC3330111

[5*] M. Hallen, D. Keedy, and B. R. Donald. Dead-end elimination with perturbations (‘DEEPER’): A provable protein design algorithm with continuous sidechain and backbone ﬂexibility. Proteins: Structure, Function and Bioinformatics 2013; 80(1):18–39. Epub before print: DOI: 10.1002/prot.24150. PMC id: PMC3491125

[6*] I. Georgiev, P. Acharya, S. Schmidt, Y. Li, D. Wycuff, G. Ofek, N. Doria-Rose, T. Luongo, Y, Yang, T. Zhou, B. R. Donald, J. Mascola, and P. Kwong. Design of epitope-speciﬁc probes for sera analysis and antibody isolation. Retrovirology 2012; 9(Suppl. 2):P50. PMC id: PMC3442034.

[7*] Bruce R. Donald. Algorithms in Structural Molecular Biology. MIT Press, Cambridge, MA, 2011. 464 pages.

[8*] P. Gainza, K. Roberts, I. Georgiev, R. Lilien, D. Keedy, C.-Y. Chen, F. Reza, A. Anderson, D. Richardson, J. Richardson, and B. R. Donald. OSPREY: Protein design with ensembles, ﬂexibility, and provable algo- rithms. In Methods in Enzymology, Vol. 523, “Methods in Protein Design” (2013). In Press. PMC Journal in progress.

[9*] C.-Y. Chen, I. Georgiev, and B. R. Donald. Algorithms for protein design. In Proceedings of 3DSIG, Struc- tural Bioinformatics in Computational Biophysics, Intelligent Systems for Molecular Biology (ISMB) Satellite Meeting, Toronto, CA, June 2008. Selected for oral presentation.

[10*] I. Georgiev, D. Keedy, J. Richardson, D. Richardson, and B. R. Donald. Algorithm for backrub motions in protein design. Bioinformatics, 4(13):i196–i204, 2008. Special issue on papers from Int’l Conf. on Intelligent Sys. for Mol. Biol. (ISMB 2008) Toronto, CA: July, 2008. PMC id: PMC2613371.

[11*] I. Georgiev, R. Lilien, and B. R. Donald. The minimized dead-end elimination criterion and its application to protein redesign in a hybrid scoring and search algorithm for computing partition functions over molecular ensembles. Journal of Computational Chemistry, 29:1527–1542, 2008. PMC id: PMC3263346.

[12*] D. Keedy, I. Georgiev B. R. Donald, D. Richardson, and J. Richardson. tions in evolved and designed mutations. PLoS Computational Biology, PMC3410847.

The role of local backrub mo- 8(8):e1002629, 2012. PMC id:

List of Publications

Page 98

Principal Investigator/Program Director (Last, first, middle): Donald, Bruce R.

[13*] Kyle E. Roberts, Patrick R. Cushing, Prisca Boisguerin, Dean R. Madden, and Bruce R. Donald. Design of protein-protein interactions with a novel ensemble-based scoring algorithm. In Proceedings of the Annual International Conference on Research in Computational Molecular Biology (RECOMB). In Lecture Notes in Computer Science, Springer-Verlag (Berlin), Volume 6577, pages 361–376, Vancouver, CA, 2011.

[14*] Kyle E. Roberts, Patrick R. Cushing, Dean R. Madden, and B. R. Donald. Design of peptide inhibitors of CFTR-associated protein CAL. In Proceedings of 3DSIG, Structural Bioinformatics in Computational Biophysics, Intelligent Systems for Molecular Biology (ISMB) Satellite Meeting, Boston, MA, July 2010. Selected for oral presentation.

[15*] J. Zeng, P. Zhou, and B. R. Donald. Protein side-chain resonance assignment and NOE assignment using RDC-deﬁned backbones without TOCSY data. J. Biomol. NMR, 50:371–395, Aug 2011. PMC id: PMC3155202.

[16*] I. Borzenets, I. Yoon, M. Prior, B. R. Donald, R. Mooney, and G. Finkelstein. Ultra-sharp metal and nanotube-based probes for applications in scanning microscopy and neural recording. Journal of Applied Physics 2012; 111 (7):74703–747036. PMC id: PMC3338587.

[17*] Jianyang Zeng, Kyle Roberts, Pei Zhou, and Bruce R. Donald. A Bayesian approach for determining protein side-chain rotamer conformations using unassigned NOE data. In Proceedings of the Annual International Conference on Research in Computational Molecular Biology (RECOMB). In Lecture Notes in Computer Science, Springer-Verlag (Berlin), Volume 6577, pages 563–578, Vancouver, CA, 2011.

Journal version appears in Journal of Computational Biology, 2011. Nov;18(11):1661–79. PMC id: PMC3216104.

[18*] C. Tripathy, A. Yan, P. Zhou, and B. R. Donald. Extracting structural information from residual chemical shift anisotropy: Analytic solutions for peptide plane orientations and applications to determine protein structure. In Proceedings of the Annual International Conference on Research in Computational Molecular Biology (RECOMB). In Lecture Notes in Computer Science, Springer-Verlag (Berlin). Beijing, People’s Republic of China (April, 2013). In Press.

Papers [19*, 20*] were written by students and postdoctoral fellows (y) supported by this grant. I do not put my name on papers stemming from students’ and post-docs’ projects simply because they have worked in my lab, unless I have made a substantial contribution above and beyond the generation of ideas and the kind of advice and discussion expected of a conscientious advisor.

[19*] S. Endo-Streeter,y M. K. Tsui, A. R. Odom, J. Block, and J. D. York. Structural studies and protein en- gineering of inositol phosphate multikinase. Journal of Biological Chemistry, 287(42):35360–35369, Oct 2012. PMC id: PMC3471723.

[20*] A. Yershova,y S. Jain,y S. M. Lavalle, and J. C. Mitchell. Generating Uniform Incremental Grids on SO(3) Using the Hopf Fibration. International Journal of Robotics Research, 29:801–812, Jun 2010. PMC id: PMC 2896220.

Protein Structures Determined in this Grant Period To Validate Our Designs

Staphylococcus aureus dihydrofolate reductase complexed with NADPH and 2,4-Diamino-5-[3-(3-methoxy-5- (2,6-dimethylphenyl)phenyl)but-1-ynyl]-6-methylpyrimidine. PDB ID: 3F0Q (2009). Reported in PNAS (2010), see Ref. [2*], above.

Staphylococcus aureus V31Y, F92I mutant dihydrofolate reductase complexed with NADPH and 5-[(3S)-3-(5- methoxy-2’,6’-dimethylbiphenyl-3-yl)but-1-yn-1-yl]-6-methylpyrimidine-2,4-diamine. PDB ID: 3LG4 (2010). Reported in PNAS (2010), see Ref. [2*], above.

List of Publications

Page 99

Principal Investigator/Program Director (Last, first, middle): Donald, Bruce R.

Bibliography & References Cited

References to papers by the PI and his students, supported by this R01 in this grant period, are marked with a ‘’, e.g., [3*].

[1*] C.-Y. Chen, I. Georgiev, A. Anderson, and B. R. Donald. Computational structure-based redesign of en- zyme activity. Proceedings of the National Academy of Sciences, U.S.A. (PNAS), 106(10):3764–3769, 2009. PMC id: PMC2645347 .

[3*] Kyle E. Roberts, Patrick R. Cushing, Prisca Boisguerin, Dean R. Madden, and Bruce R. Donald. Computa- tional design of a PDZ domain peptide inhibitor that rescues CFTR activity. PLoS Computational Biology, 8(4):e1002477, 2012. PMC id: PMC3257257

[4*] P. Gainza, K. Roberts, and B. R. Donald. Protein design using continuous rotamers. PLoS Computational Biology, 8(1):e1002335 (15 pages), January 2012. PMC id: PMC3330111

[7*] Bruce R. Donald. Algorithms in Structural Molecular Biology. MIT Press, Cambridge, MA, 2011. 464 pages.

[9*] C.-Y. Chen, I. Georgiev, and B. R. Donald. Algorithms for protein design. In Proceedings of 3DSIG, Structural Bioinformatics in Computational Biophysics, Intelligent Systems for Molecular Biology (ISMB) Satellite Meeting, Toronto, CA, June 2008. Selected for oral presentation.

[12*] D. Keedy, I. Georgiev B. R. Donald, D. Richardson, and J. Richardson. The role of local backrub mo- tions in evolved and designed mutations. PLoS Computational Biology, 8(8):e1002629, 2012. PMC id: PMC3410847.

References Cited

Page 100

Principal Investigator/Program Director (Last, first, middle): Donald, Bruce R.

[17*] Jianyang Zeng, Kyle Roberts, Pei Zhou, and Bruce R. Donald. A Bayesian approach for determining protein side-chain rotamer conformations using unassigned NOE data. In Proceedings of the Annual In- ternational Conference on Research in Computational Molecular Biology (RECOMB). In Lecture Notes in Computer Science, Springer-Verlag (Berlin), Volume 6577, pages 563–578, Vancouver, CA, 2011.

Journal version appears in Journal of Computational Biology, 2011. Nov;18(11):1661–79. PMC id: PMC3216104.

[19*] S. Endo-Streeter, M. K. Tsui, A. R. Odom, J. Block, and J. D. York. Structural studies and protein engineer- ing of inositol phosphate multikinase. Journal of Biological Chemistry, 287(42):35360–35369, Oct 2012. PMC id: PMC3471723.

[20*] A. Yershova, S. Jain, S. M. Lavalle, and J. C. Mitchell. Generating Uniform Incremental Grids on SO(3) Using the Hopf Fibration. International Journal of Robotics Research, 29:801–812, Jun 2010. PMC id: PMC 2896220.

[21]Frank J Accurso, Steven M Rowe, J P Clancy, Michael P Boyle, Jordan M Dunitz, Peter R Durie, Scott D Sagel, Douglas B Hornick, Michael W Konstan, Scott H Donaldson, Richard B Moss, Joseph M Pilewski, Ronald C Rubenstein, Ahmet Z Uluer, Moira L Aitken, Steven D Freedman, Lynn M Rose, Nicole Mayer- Hamblett, Qunming Dong, Jiuhong Zha, Anne J Stone, Eric R Olson, Claudia L Ordoez, Preston W Camp- bell, Melissa A Ashlock, and Bonnie W Ramsey. Effect of VX-770 in persons with cystic ﬁbrosis and the G551D-CFTR mutation. The New England journal of medicine, 363(21):1991–2003, November 2010. PMID: 21083385.

[22]M.D. Altman, E.A. Nalivaika, M. Prabu-Jeyabalan, C.A. Schiffer, and B. Tidor. Computational design and experimental study of tighter binding peptides to an inactivated mutant of hiv-1 protease. Proteins: Struc- ture, Function, and Bioinformatics, 70(3):678–694, 2008.

[23]Michael D Altman, Ellen A Nalivaika, Moses Prabu-Jeyabalan, Celia A Schiffer, and Bruce Tidor. Computa- tional design and experimental study of tighter binding peptides to an inactivated mutant of HIV-1 protease. Proteins, 70(3):678–694, February 2008. PMID: 17729291.

[24]Amy C. Anderson. Winning the arms race by improving drug discovery against mutating targets. ACS Chemical Biology, 7(2):278–288, February 2012.

[25]Mariana Babor, Daniel J. Mandell, and Tanja Kortemme. Assessment of ﬂexible backbone protein design methods for sequence library prediction in the therapeutic antibody HerceptinHER2 interface. Protein Science, 20(6):10821089, 2011.

[26]Jennifer M Beierlein, Lalit Deshmukh, Kathleen M Frey, Olga Vinogradova, and Amy C Anderson. The solution structure of bacillus anthracis dihydrofolate reductase yields insight into the analysis of structure- activity relationships for novel inhibitors. Biochemistry, 48(19):4100–4108, May 2009. PMID: 19323450.

[27]Jennifer M Beierlein, Kathleen M Frey, David B Bolstad, Phillip M Pelphrey, Tammy M Joska, Adrienne E Smith, Nigel D Priestley, Dennis L Wright, and Amy C Anderson. Synthetic and crystallographic studies of a new inhibitor series targeting bacillus anthracis dihydrofolate reductase. Journal of medicinal chemistry, 51(23):7532–7540, December 2008. PMID: 19007108.

References Cited

Page 101

Principal Investigator/Program Director (Last, first, middle): Donald, Bruce R.

[28]David D. Boehr, Dan McElheny, H. Jane Dyson, and Peter E. Wright. The dynamic energy landscape of dihydrofolate reductase catalysis. Science, 313(5793):1638–1642, 2006.

[29]A. J. Bordner and R. A. Abagyan. Large-scale prediction of protein geometry and stability changes for arbitrary single point mutations. Proteins: Structure, Function, and Bioinformatics, 57(2):400413, 2004.

[30]Barbara Brannetti and Manuela Helmer-Citterich. iSPOT: a web tool to infer the interaction speciﬁcity of families of protein modules. Nucleic Acids Research, 31(13):3709 –3711, July 2003.

[31]Z.W. Cao, L.Y. Han, C.J. Zheng, Z.L. Ji, X. Chen, H.H. Lin, and Y.Z. Chen. Computer prediction of drug resistance mutations in proteins. Drug discovery today, 10(7):521–529, 2005.

[32]Jn Champness, A Achari, Sp Ballantine, Pk Bryant, Cj Delves, and Dk Stammers. The structure of pneu- mocystis carinii dihydrofolate reductase to 1.9 resolution. Structure, 2(10):915–924, October 1994.

[33]David C. Chan, Christine T. Chutkowski, and Peter S. Kim. Evidence that a prominent cavity in the coiled coil of HIV type 1 gp41 is an attractive drug target. Proceedings of the National Academy of Sciences, 95(26):15613–15617, December 1998.

[34]J P Clancy, Steven M Rowe, Frank J Accurso, Moira L Aitken, Raouf S Amin, Melissa A Ashlock, Man- fred Ballmann, Michael P Boyle, Inez Bronsveld, Preston W Campbell, Kris De Boeck, Scott H Don- aldson, Henry L Dorkin, Jordan M Dunitz, Peter R Durie, Manu Jain, Anissa Leonard, Karen S McCoy, Richard B Moss, Joseph M Pilewski, Daniel B Rosenbluth, Ronald C Rubenstein, Michael S Schechter, Martyn Botﬁeld, Claudia L Ordoez, George T Spencer-Green, Laurent Vernillet, Steve Wisseh, Karl Yen, and Michael W Konstan. Results of a phase IIa study of VX-809, an investigational CFTR corrector com- pound, in subjects with cystic ﬁbrosis homozygous for the f508del-CFTR mutation. Thorax, 67(1):12–18, January 2012. PMID: 21825083.

[35]Bruno E Correia, Yih-En Andrew Ban, Margaret A Holmes, Hengyu Xu, Katharine Ellingson, Zane Kraft, Chris Carrico, Erica Boni, D Noah Sather, Camille Zenobia, Katherine Y Burke, Tyler Bradley-Hewitt, Jessica F Bruhn-Johannsen, Oleksandr Kalyuzhniy, David Baker, Roland K Strong, Leonidas Stamatatos, and William R Schief. Computational design of epitope-scaffolds allows induction of antibodies speciﬁc for a poorly immunogenic HIV vaccine epitope. Structure (London, England: 1993), 18(9):1116–1126, September 2010. PMID: 20826338.

[36]Patrick R Cushing, Abigail Fellows, Daniel Villone, Prisca Boisgurin, and Dean R Madden. The relative binding afﬁnities of PDZ partners for CFTR: a biochemical basis for efﬁcient endocytic recycling. Biochem- istry, 47(38):10084–10098, September 2008. PMID: 18754678.

[37]Patrick R Cushing, Lars Vouilleme, Maria Pellegrini, Prisca Boisguerin, and Dean R Madden. A stabilizing inﬂuence: CAL PDZ inhibition extends the half-life of F508-CFTR. Angewandte Chemie (International Ed. in English), 49(51):9907–9911, December 2010. PMID: 21105033.

[38]G.E. Dale, C. Broger, A. DArcy, P.G. Hartman, R. DeHoogt, S. Jolidon, I. Kompis, A.M. Labhardt, H. Lan- gen, H. Locher, et al. A single amino acid substitution in staphylococcus aureus dihydrofolate reductase determines trimethoprim resistance. Journal of molecular biology, 266(1):23–30, 1997.

[39]Nika N. Danial. BCL-2 family proteins: Critical checkpoints of apoptotic cell death. Clinical Cancer Re- search, 13(24):7254–7263, December 2007.

[40]J. R. Desjarlais and T. M. Handel. De novo design of the hydrophobic cores of proteins. Protein Science : A Publication of the Protein Society, 4(10):20062018, October 1995. PMCID: PMC2142989.

[41]Johan Desmet, Jan Spriet, and Ignace Lasters. Fast and accurate sidechain topology and energy reﬁne- ment (FASTER) as a new method for protein structure optimization. Proteins: Structure, Function, and Bioinformatics, 48(1):31–43, July 2002.

[42]Ron Diskin, Johannes F. Scheid, Paola M. Marcovecchio, Anthony P. West, Florian Klein, Han Gao, Priyan- thi N. P. Gnanapragasam, Alexander Abadir, Michael S. Seaman, Michel C. Nussenzweig, and Pamela J. Bjorkman. Increasing the potency and breadth of an HIV antibody by using structure-based rational design. Science, October 2011.

References Cited

Page 102

Principal Investigator/Program Director (Last, first, middle): Donald, Bruce R.

[43]Emine A. Ercikan-Abali, Shin Mineishi, Youzhi Tong, Saori Nakahara, Mark C. Waltham, Debabrata Baner- jee, Wen Chen, Michel Sadelain, and Joseph R. Bertino. Active site-directed double mutants of dihydrofo- late reductase. Cancer Research, 56(18):4142–4145, September 1996.

[44]J W Erickson and S K Burt. Structural mechanisms of HIV drug resistance. Annual review of pharmacology and toxicology, 36:545–571, 1996. PMID: 8725401.

[45]Kathleen M Frey, Michael N Lombardo, Dennis L Wright, and Amy C Anderson. Towards the understanding of resistance mechanisms in clinically isolated trimethoprim-resistant, methicillin-resistant staphylococcus aureus dihydrofolate reductase. Journal of structural biology, 170(1):93–97, April 2010. PMID: 20026215.

[46]Kathleen M. Frey, Kishore Viswanathan, Dennis L. Wright, and Amy C. Anderson. Prospective screening of novel antibacterial inhibitors of dihydrofolate reductase for mutational resistance. Antimicrobial Agents and Chemotherapy, 56(7):3556–3562, July 2012.

[47]K.M. Frey, M.N. Lombardo, D.L. Wright, and A.C. Anderson. Towards the understanding of resistance mechanisms in clinically isolated trimethoprim-resistant, methicillin-resistant staphylococcus aureus dihy- drofolate reductase. Journal of structural biology, 170(1):93–97, 2010.

[48]Menachem Fromer and Chen Yanover. A computational framework to empower probabilistic protein design. Bioinformatics, 24(13):i214–222, July 2008.

[49]Sheri M. Fujihara, Jeffrey S. Cleaveland, Laura S. Grosmaire, Karen K. Berry, Karen A. Kennedy, James J. Blake, James Loy, Bruce M. Rankin, Jeffrey A. Ledbetter, and Steven G. Nadler. A d-amino acid peptide inhibitor of NF-B nuclear localization is efﬁcacious in models of inﬂammatory disease. The Journal of Immunology, 165(2):1004–1012, July 2000.

[50]I. Georgiev and B. R. Donald. Dead-end elimination with backbone ﬂexibility. Bioinformatics, 23(13), 2007. Special issue on papers from the Int’l Conf. on Intelligent Sys. for Mol. Biol. (ISMB 2007), Vienna, Austria: July 21-25, 2007. PMID: 17646295.

[51]M.K. Gilson, J.A. Given, B.L. Bush, and J.A. McCammon. The statistical-thermodynamic basis for compu- tation of binding afﬁnities: a critical review. Biophys J, 7:1047–1069, 1997.

[52]R Goldstein. Efﬁcient rotamer elimination applied to protein side-chains and related spin glasses. Biophys- ical Journal, 66(5):1335–1340, May 1994.

[53]Fredrick Van Goor, Sabine Hadida, Peter D. J. Grootenhuis, Bill Burton, Jeffrey H. Stack, Kimberly S. Straley, Caroline J. Decker, Mark Miller, Jason McCartney, Eric R. Olson, Jeffrey J. Wine, Ray A. Frizzell, Melissa Ashlock, and Paul A. Negulescu. Correction of the f508del-CFTR protein processing defect in vitro by the investigational drug VX-809. Proceedings of the National Academy of Sciences, 108(46):18843– 18848, November 2011.

[54]M. J. Gorczynski, J. Grembecka, Y. Zhou, Y. Kong, L. Roudaiya, M. G. Douvas, M. Newman, I. Bielnicka, G. Baber, T. Corpora, J. Shi, M. Sridharan, R. Lilien, B. R. Donald, N. A. Speck, M. L. Brown, and J. H. Bushweller. Allosteric inhibition of the protein-protein interaction between the leukemia-associated proteins RUNX1 and CBF. Chemistry & Biology, 14(10), 2007. PMID: 17961830.

[55]Gevorg Grigoryan, Aaron W. Reinke, and Amy E. Keating. Design of protein-interaction speciﬁcity gives selective bZIP-binding peptides. Nature, 458(7240):859–864, April 2009.

[56]William B Guggino and Bruce A Stanton. New insights into cystic ﬁbrosis: molecular switches that regulate

CFTR. Nature Reviews. Molecular Cell Biology, 7(6):426–436, June 2006. PMID: 16723978.

[57]Gordon G. Hammes, Yu-Chu Chang, and Terrence G. Oas. Conformational selection or induced ﬁt: A ﬂux description of reaction mechanism. Proceedings of the National Academy of Sciences, 106(33):13737– 13741, August 2009.

[58]Corwin Hansch, Renli Li, Jeffrey M. Blaney, and Robert Langridge. Comparison of the inhibi- tion of escherichia coli and lactobacillus casei dihydrofolate reductase by 2,4-diamino-5-(substituted- benzyl)pyrimidines: quantitative structure-activity relationships, x-ray crystallography, and computer graph- ics in structure-activity analysis. Journal of Medicinal Chemistry, 25(7):777–784, July 1982.

References Cited

Page 103

Principal Investigator/Program Director (Last, first, middle): Donald, Bruce R.

[59]O. Haq, M. Andrec, A.V. Morozov, and R.M. Levy. Correlated electrostatic mutations provide a reservoir of stability in HIV protease. PLoS Computational Biology, 8(9):e1002675, 2012.

[60]Xiaozhen Hu and Brian Kuhlman. Protein design simulations suggest that side-chain conformational en- tropy is not a strong determinant of amino acid environmental preferences. Proteins: Structure, Function, and Bioinformatics, 62(3):739748, 2006.

[61]D.J. Huggins and B. Tidor. Systematic placement of structural water molecules for improved scoring of protein–ligand interactions. Protein Engineering Design and Selection, 24(10):777–789, 2011.

[62] E. Humphris-Narayanan, E. Akiva, R. Varela, S.O. Conchuir,´ and T. Kortemme. Prediction of mutational tolerance in hiv-1 protease and reverse transcriptase using ﬂexible backbone protein design. PLoS Com- putational Biology, 8(8):e1002639, 2012.

[63] P Huovinen, L Sundstrm, G Swedberg, and O Skld. Trimethoprim and sulfonamide resistance. Antimicro- bial Agents and Chemotherapy, 39(2):279–289, February 1995. PMID: 7726483 PMCID: PMC162528.

[64] H. Ishikita and A. Warshel. Predicting drug-resistant mutations of HIV protease. Angewandte Chemie, 120(4):709–712, 2007.

[65] L. Jiang, B. Kuhlman, T. Kortemme, and D. Baker. A solvated rotamer approach to modeling water- mediated hydrogen bonds at protein–protein interfaces. Proteins: Structure, Function, and Bioinformatics, 58(4):893–904, 2005.

[66] Xin Jiang, Ernie Pistor, Ramy S. Farid, and Hany Farid. A new approach to the design of uniquely folded thermally stable proteins. Protein Science, 9(2):403–416, 2000.

[67] Lukasz A. Joachimiak, Tanja Kortemme, Barry L. Stoddard, and David Baker. Computational design of a new hydrogen bond network and at least a 300-fold speciﬁcity switch at a protein-protein interface. Journal of Molecular Biology, 361(1):195–208, August 2006.

[68] David T. Jones. De novo protein design using pairwise potentials and a genetic algorithm. Protein Science, 3(4):567–574, 1994.

[69] Hetunandan Kamisetty, Arvind Ramanathan, Chris Bailey-Kellogg, and Christopher James Langmead. Accounting for conformational entropy in predicting binding free energies of protein-protein interactions. Proteins: Structure, Function, and Bioinformatics, 79(2):444–462, February 2011.

[70] Eili Klein, David L. Smith, and Ramanan Laxminarayan. Hospitalizations and deaths caused by methicillin- resistant staphylococcus aureus , united states, 19992005. Emerging Infectious Diseases, 13(12):1840– 1846, December 2007.

[71] Patrice Koehl and Marc Delarue. Application of a self-consistent mean ﬁeld theory to predict protein side- chains conformation and estimate their conformational entropy. Journal of Molecular Biology, 239(2):249– 275, June 1994.

[72] Nobuyasu Koga, Rie Tatsumi-Koga, Gaohua Liu, Rong Xiao, Thomas B. Acton, Gaetano T. Montelione, and David Baker. Principles for designing ideal protein structures. Nature, 491(7423):222–227, November 2012.

[73] B Kuhlman and D Baker. Native protein sequences are close to optimal for their structures. Proceedings of the National Academy of Sciences of the United States of America, 97(19):10383–10388, September 2000. PMID: 10984534.

[74] E.L. Kwak, R. Sordella, D.W. Bell, N. Godin-Heymann, R.A. Okimoto, B.W. Brannigan, P.L. Harris, D.R. Driscoll, P. Fidias, T.J. Lynch, et al. Irreversible inhibitors of the egf receptor may circumvent acquired resistance to geﬁtinib. Proceedings of the National Academy of Sciences of the United States of America, 102(21):7665–7670, 2005.

[75] Andrew R. Leach and Andrew P. Lemon. Exploring the conformational space of protein side chains using dead-end elimination and the a* algorithm. Proteins: Structure, Function, and Genetics, 33(2):227–239, 1998.

References Cited

Page 104

Principal Investigator/Program Director (Last, first, middle): Donald, Bruce R.

[76]Ubolsree Leartsakulpanich, Mallika Imwong, Sasithon Pukrittayakamee, Nicholas J White, Georges Snounou, Worachart Sirawaraporn, and Yongyuth Yuthavong. Molecular characterization of dihydrofolate reductase in relation to antifolate resistance in plasmodium vivax. Molecular and Biochemical Parasitology, 119(1):63–73, January 2002.

[77]A. Leaver-Fray, B. Kuhlman, and J. Snoeyink. An adaptive dynamic progamming algorithm for the side- chain placement problem. Pac Symp Biocomput, NIL(NIL):16–27, 2005.

[78]Christopher Lee and S. Subbiah. Prediction of protein side-chain conformation by packing optimization. Journal of Molecular Biology, 217(2):373–388, 1991.

[79]Shaun M Lippow, K Dane Wittrup, and Bruce Tidor. Computational design of antibody-afﬁnity improvement beyond in vivo maturation. Nat Biotech, 25(10):1171–1176, October 2007.

[80]Jieying Liu, David B Bolstad, Erin S D Bolstad, Dennis L Wright, and Amy C Anderson. Towards new antifolates targeting eukaryotic opportunistic infections. Eukaryotic cell, 8(4):483–486, April 2009. PMID: 19168759.

[81]Jieying Liu, David B Bolstad, Adrienne E Smith, Nigel D Priestley, Dennis L Wright, and Amy C Anderson. Structure-guided development of efﬁcacious antifungal agents targeting candida glabrata dihydrofolate reductase. Chemistry & biology, 15(9):990–996, September 2008. PMID: 18804036.

[82]Jieying Liu, David B Bolstad, Adrienne E Smith, Nigel D Priestley, Dennis L Wright, and Amy C Anderson. Probing the active site of candida glabrata dihydrofolate reductase with high resolution crystal structures and the synthesis of new inhibitors. Chemical biology & drug design, 73(1):62–74, January 2009. PMID: 19152636.

[83]Simon C. Lovell, J. Michael Word, Jane S. Richardson, and David C. Richardson. The penultimate rotamer library. Proteins: Structure, Function, and Bioinformatics, 40(3):389–408, 2000.

[84]Marc Maeyer, Johan Desmet, and Ignace Lasters. The dead-end elimination theorem:. In David M. Webster, editor, Protein Structure Prediction, volume 143 of Methods in Molecular Biology, pages 265– 304. Humana Press, 2000.

[85]Laura E. McCoy, Anna Forsman Quigley, Nika M. Strokappe, Bianca Bulmer-Thomas, Michael S. Seaman, Daniella Mortier, Lucy Rutten, Nikita Chander, Carolyn J. Edwards, Robin Ketteler, David Davis, Theo Verrips, and Robin A. Weiss. Potent and broad neutralization of HIV-1 by a llama antibody elicited by immunization. The Journal of Experimental Medicine, 209(6):1091–1103, June 2012.

[86]Susan M. Miller, Reyna J. Simon, Simon Ng, Ronald N. Zuckermann, Janice M. Kerr, and Walter H. Moos. Comparison of the proteolytic susceptibilities of homologous l-amino acid, d-amino acid, and n-substituted glycine peptide and peptoid oligomers. Drug Development Research, 35(1):2032, 1995.

[87]Grant S. Murphy, Jeffrey L. Mills, Michael J. Miley, Mischa Machius, Thomas Szyperski, and Brian Kuhlman. Increasing sequence diversity with ﬂexible backbone protein design: The complete redesign of a protein hydrophobic core. Structure, 20(6):1086–1096, June 2012.

[88]Janet L Paulsen, Jieying Liu, David B Bolstad, Adrienne E Smith, Nigel D Priestley, Dennis L Wright, and Amy C Anderson. In vitro biological activity and structural analysis of 2,4-diamino-5-(2’- arylpropargyl)pyrimidine inhibitors of candida albicans. Bioorganic & medicinal chemistry, 17(14):4866– 4872, July 2009. PMID: 19560363.

[89]MA Pfaller and DJ Diekema. Epidemiology of invasive candidiasis: a persistent public health problem. Clinical Microbiology Reviews, 20(1):133–163, 2007.

[90]MA Pfaller, SA Messer, L. Boyken, S. Tendolkar, RJ Hollis, and DJ Diekema. Geographic variation in the susceptibilities of invasive isolates of candida glabrata to seven systemically active antifungal agents: a global assessment from the artemis antifungal surveillance program conducted in 2001 and 2002. Journal of clinical microbiology, 42(7):3142–3146, 2004.

References Cited

Page 105

Principal Investigator/Program Director (Last, first, middle): Donald, Bruce R.

[91]Heidi K. Privett, Gert Kiss, Toni M. Lee, Rebecca Blomberg, Roberto A. Chica, Leonard M. Thomas, Donald Hilvert, Kendall N. Houk, and Stephen L. Mayo. Iterative approach to computational enzyme design. Proceedings of the National Academy of Sciences, 109(10):3790–3795, March 2012.

[92]Bonnie W Ramsey, Jane Davies, N Gerard McElvaney, Elizabeth Tullis, Scott C Bell, Pavel Devnek, Matthias Griese, Edward F McKone, Claire E Wainwright, Michael W Konstan, Richard Moss, Felix Ratjen, Isabelle Sermet-Gaudelus, Steven M Rowe, Qunming Dong, Sally Rodriguez, Karl Yen, Claudia Ordoez, and J Stuart Elborn. A CFTR potentiator in patients with cystic ﬁbrosis and the G551D mutation. The New England journal of medicine, 365(18):1663–1672, November 2011. PMID: 22047557.

[93]Jose Reina, Emmanuel Lacroix, Scott D. Hobson, Gregorio Fernndez-Ballester, Vladimir Rybin, Markus S. Schwab, Luis Serrano, and Cayetano Gonzalez. Computer-aided design of a PDZ domain to recognize new target sequences. Nat Struct Mol Biol, 9(8):621–627, 2002.

[94]P. Douglas Renfrew, Eun Jung Choi, Richard Bonneau, and Brian Kuhlman. Incorporation of noncanon- ical amino acids into rosetta and use in computational protein-peptide interface design. PLoS ONE, 7(3):e32637, March 2012.

[95]Kimberly A Reynolds, Melinda S Hanes, Jodi M Thomson, Andrew J Antczak, James M Berger, Robert A Bonomo, Jack F Kirsch, and Tracy M Handel. Computational redesign of the SHV-1 beta-lactamase/beta- lactamase inhibitor protein interface. Journal of Molecular Biology, 382(5):1265–1275, October 2008. PMID: 18775544.

[96]Daniela Rthlisberger, Olga Khersonsky, Andrew M. Wollacott, Lin Jiang, Jason DeChancie, Jamie Betker, Jasmine L. Gallaher, Eric A. Althoff, Alexandre Zanghellini, Orly Dym, Shira Albeck, Kendall N. Houk, Dan S. Tawﬁk, and David Baker. Kemp elimination catalysts by computational enzyme design. Nature, 453(7192):190–195, March 2008.

[97]Maria Saﬁ and Ryan H. Lilien. Efﬁcient a priori identiﬁcation of drug resistant mutations using dead-end elimination and MM-PBSA. Journal of Chemical Information and Modeling, 52(6):1529–1541, June 2012.

[98]Daniele Sciretti, Pierpaolo Bruscolini, Alessandro Pelizzola, Marco Pretti, and Alfonso Jaramillo. Compu- tational protein design with side-chain conformational entropy. Proteins: Structure, Function, and Bioinfor- matics, 74(1):176191, 2009.

[99]Scott J Shandler, Ivan V Korendovych, David T Moore, Kathryn B Smith-Dupont, Craig N Streu, Rustem I Litvinov, Paul C Billings, Feng Gai, Joel S Bennett, and William F DeGrado. Computational design of a -peptide that targets transmembrane helices. Journal of the American Chemical Society, 133(32):12378– 12381, August 2011. PMID: 21780757.

[100]Worachart Sirawaraporn, Tanajit Sathitkul, Rachada Sirawaraporn, Yongyuth Yuthavong, and Daniel V. Santi. Antifolate-resistant mutants of plasmodium falciparum dihydrofolate reductase. Proceedings of the National Academy of Sciences, 94(4):1124–1129, February 1997.

[101]A. Sircar, E. T. Kim, and J. J. Gray. RosettaAntibody: antibody variable region homology modeling server. Nucleic Acids Research, 37(Web Server):W474–W479, May 2009.

[102]Colin A. Smith and Tanja Kortemme. Structure-based prediction of the peptide sequence space recognized by natural and synthetic PDZ domains. Journal of Molecular Biology, 402(2):460–474, September 2010.

[103]Colin A Smith, Catherine A Shi, Matthew K Chroust, Thomas E Bliska, Mark J S Kelly, Matthew P Jacobson, and Tanja Kortemme. Design of a phosphorylatable PDZ domain with peptide-speciﬁc afﬁnity changes. Structure (London, England: 1993), November 2012. PMID: 23159126.

[104]Brian W Stevens, Ryan H Lilien, Ivelin Georgiev, Bruce R Donald, and Amy C Anderson. Redesigning the PheA domain of gramicidin synthetase leads to a new understanding of the enzyme’s mechanism and selectivity. Biochemistry, 45(51):15495–15504, December 2006. PMID: 17176071.

[105]P. Benjamin Stranges and Brian Kuhlman. A comparison of successful and failed protein interface designs highlights the challenges of designing buried hydrogen bonds. Protein Science, page n/an/a, 2012.

References Cited

Page 106

Principal Investigator/Program Director (Last, first, middle): Donald, Bruce R.

[106]John Thomas, Naren Ramakrishnan, and Chris Bailey-Kellogg. Graphical models of protein-protein in- teraction speciﬁcity from correlated mutations and interaction data. Proteins: Structure, Function, and Bioinformatics, 76(4):911–929, September 2009.

[107]Chittaranjan Tripathy, Jianyang Zeng, Pei Zhou, and Bruce Randall Donald. Protein loop closure using orientational restraints from NMR data. Proteins, February 2012. PMID: 22161780. Cover Article.

[108]Igor F Tsigelny, Leslie Crews, Paula Desplats, Gideon M Shaked, Yuriy Sharikov, Hideya Mizuno, Brian Spencer, Edward Rockenstein, Margarita Trejo, Oleksandr Platoshyn, Jason X-J Yuan, and Eliezer Masliah. Mechanisms of hybrid oligomer formation in the pathogenesis of combined alzheimer’s and parkinson’s diseases. PloS one, 3(9):e3135, 2008. PMID: 18769546.

[109]Shiou-Ru Tzeng and Charalampos G Kalodimos. Protein activity regulation by conformational entropy. Nature, 488(7410):236–240, August 2012. PMID: 22801505.

[110]Gregory L. Verdine and Loren D. Walensky. The challenge of drugging undruggable targets in cancer: Lessons learned from targeting BCL-2 family members. Clinical Cancer Research, 13(24):7264–7270, December 2007.

[111]G.M. Verkhivker. Computational proteomics of biomolecular interactions in the sequence and structure space of the tyrosine kinome: deciphering the molecular basis of the kinase inhibitors selectivity. Proteins: Structure, Function, and Bioinformatics, 66(4):912–929, 2007.

[112]Janice Villali and Dorothee Kern. Choreographing an enzyme’s dance. Current opinion in chemical biology, 14(5):636–643, October 2010. PMID: 20822946.

[113]M. Wadman. HIV trial under scrutiny. Nature News, 493:279–280, 2013.

[114]Habibah A. Wahab, Yee-Siew Choong, Pazilah Ibrahim, Amirin Sadikun, and Thomas Scior. Elucidating isoniazid resistance using molecular modeling. Journal of Chemical Information and Modeling, 49(1):97– 107, January 2009.

[115]Loren D. Walensky, Andrew L. Kung, Iris Escher, Thomas J. Malia, Scott Barbuto, Renee D. Wright, Gerhard Wagner, Gregory L. Verdine, and Stanley J. Korsmeyer. Activation of apoptosis in vivo by a hydrocarbon-stapled BH3 helix. Science, 305(5689):1466–1470, September 2004.

[116]Ray Yu-Ruei Wang, Yan Han, Kristina Krassovsky, William Shefﬂer, Michael Tyka, and David Baker. Mod- eling disordered regions in proteins using rosetta. PLoS ONE, 6(7):e22060, July 2011.

[117]Paul A. Wender, Dennis J. Mitchell, Kanaka Pattabiraman, Erin T. Pelkey, Lawrence Steinman, and Jonathan B. Rothbard. The design, synthesis, and evaluation of molecules that enable or enhance cellular uptake: Peptoid molecular transporters. Proceedings of the National Academy of Sciences, 97(24):13003– 13008, November 2000.

[118]Michael Wolde, Abigail Fellows, Jie Cheng, Aleksandr Kivenson, Bonita Coutermarsh, Laleh Talebian, Katherine Karlson, Andrea Piserchio, Dale F. Mierke, Bruce A. Stanton, William B. Guggino, and Dean R. Madden. Targeting CAL as a negative regulator of F508-CFTR cell-surface expression. Journal of Biolog- ical Chemistry, 282(11):8099–8109, March 2007.

[119]Xueling Wu, Zhi-Yong Yang, Yuxing Li, Carl-Magnus Hogerkorp, William R. Schief, Michael S. Seaman, Tongqing Zhou, Stephen D. Schmidt, Lan Wu, Ling Xu, Nancy S. Longo, Krisha McKee, Sijy O’Dell, Mark K. Louder, Diane L. Wycuff, Yu Feng, Martha Nason, Nicole Doria-Rose, Mark Connors, Peter D. Kwong, Mario Roederer, Richard T. Wyatt, Gary J. Nabel, and John R. Mascola. Rational design of envelope iden- tiﬁes broadly neutralizing human monoclonal antibodies to HIV-1. Science, 329(5993):856–861, August 2010.

[120]J. Xu and B. Berger. Fast and accurate algorithms for protein side-chain packing. J. ACM, 53(4):533–557, 2006.

[121]Chen Yanover and Yair Weiss. Approximate inference and Protein-Folding. Advances in Neural Information Processing Systems, pages 84—86, 2002.

References Cited

Page 107