BBK* (Branch and Bound over K*): A Provable and Efficient Ensemble-Based Protein Design Algorithm to Optimize Stability and Binding Affinity over Large Sequence Spaces

by A. A. Ojewole, J. D. Jou and B. R. Donald.

RECOMB 2017 (Hong Kong), and
Journal of Computational Biology March 2018.
DOI: https://doi.org/10.1089/cmb.2017.0267

at Journal: [Abstract, HTML, PDF]
Here: [PDF, SI]

Abstract:

Computational protein design (CPD) algorithms that compute binding affinity, Ka, search for sequences with an energetically favorable free energy of binding. Recent work shows that three principles improve the biological accuracy of CPD: ensemble-based design, continuous flexibility of backbone and side-chain conformations, and provable guarantees of accuracy with respect to the input. However, previous methods that use all three design principles are single-sequence (SS) algorithms, which are very costly: linear in the number of sequences and thus exponential in the number of simultaneously mutable residues. To address this computational challenge, we introduce BBK*, a new CPD algorithm whose key innovation is the multisequence (MS) bound: BBK* efficiently computes a single provable upper bound to approximate Ka for a combinatorial number of sequences, and avoids SS computation for all provably suboptimal sequences. Thus, to our knowledge, BBK* is the first provable, ensemble-based CPD algorithm to run in time sublinear in the number of sequences. Computational experiments on 204 protein design problems show that BBK* finds the tightest binding sequences while approximating Ka for up to 105-fold fewer sequences than the previous state-of-the-art algorithms, which require exhaustive enumeration of sequences. Furthermore, for 51 protein-ligand design problems, BBK* provably approximates Ka up to 1982-fold faster than the previous state-of-the-art iMinDEE/A*/K* algorithm. Therefore, BBK* not only accelerates protein designs that are possible with previous provable algorithms, but also efficiently performs designs that are too large for previous methods.

Supplementary information:Supplementary data are available at here, on our lab website.