This section describes how OSPREY itself is organized internally. It might serve as a nice introduction to the OSPREY code for a developer who is new to the project.
OSPREY’s purpose in life is to find protein sequences that have properties we’re interested in, like stability or affinity with other molecules. OSPREY does this by doing two basic tasks over and over:
Task 1: Find a conformation using graph theory
Task 2: Evaluate a conformation using physics
In this case, a conformation is a molecule, but we’ve replaced some of the atoms in it with a different set of atoms. This allows us to describe mutations to a molecule and discrete flexibility in the atom coordinates in an efficent, compact way. OSPREY finds these conformations using various graph algorithms operating on a conformation space. A conformation space is the set of all mutations we can make to a molecule, and all the ways we allow the atom coordinates to change within a mutation.
Once OSPREY has a conformation, it evaluates that conformation according to a physical forcefield to update its scoring model about that conformation, and the sequence of which the conformation is a part. To model the possibility that the conformation we actually want isn’t perfectly represented by the atom coordinates we started with, OSPREY allows the atoms to move slightly from their original positions. OSPREY uses a continuous motion to move the atoms, for example, by a dihedral angle rotation around a rotatable bond. Using the forcefield and the continuous motions as an objective function, OSPREY minimizes the conformation to find the energy of the best nearby atom coordinates.
Once OSPREY has enough information about different conformations and their minimized energies, it can claim that a certain sequence has or doesn’t have the properties that the design project seeks.
OSPREY is almost entirely written in Java (see /src/main/java
). Newer parts of OSPREY, like the Desktop software,
are written in Kotlin though, which in many ways is like a successor to the Java language.
There’s a thin layer of Python on top (see /src/main/python
) acting as a scripting API
for designers and data scientists.
A small part of performance-critical code has been copied into to C++ (see /src/main/cc
and /src/main/cu
)
and optimized for various hardware platforms, including CPUs and GPUs.
Since OSPREY is largely developed by academic researchers working on projects in biology and chemistry, OSPREY has naturally adopted a copy-on-write approach to improving systems. Rather than upgrade an existing system in-place with a painful refactoring that may distrupt a researcher’s progress towards a project (like a paper, or a PhD), it’s generally far easier to implement new features in independent code, and then convince other researchers to upgrade to the new version between research projects, rather than during them.
This copy-on-write philosophy has led to the proliferation of many different implemenations of similar features in OSPREY’s code over time. And due to the needs of academic reproducibility, it’s essentially impossible to delete older implementations that are no longer actively in use. So OSPREY’s code is littered with a complete history of various implementations for concepts like conformation spaces, energy functions, and minimizers. This can all be extremely confusing to new OSPREY developers, so one of the purposes of this document is to point you quickly in the direction of the modern and actively in-use implementaions of OSPREY’s basic features.
Here’s a quick roadmap of the source code that implements the most modern versions of
OSPREY’s basic features. The sources will be given by the source root folder and the
verbose Fully-Qualified name of the relevant class, but the common package edu.duke.cs.osprey
has been omitted.
Conformation Space: /src/main/java
/ .confspace.compiled.ConfSpace
Conformation: /src/main/java
/ .confspace.compiled.AssignedCoords
Energy Matrix: /src/main/java
/ ematrix.EnergyMatrix
Energy matrix calculator: /src/main/java
/ ematrix.compiled.EmatCalculator
A* search: /src/main/java
/ astar.conf.ConfAStarTree
Conformation energy calculator: /src/main/java
/ energy.compiled.CPUConfEnergyCalculator
“native” energy calculator: /src/main/java
/ energy.compiled.NativeConfEnergyCalculator
/src/main/cc/ConfEcalc
.CUDA energy calculator: /src/main/java
/ energy.compiled.CudaConfEnergyCalculator
/src/main/cu/CudaConfEcalc
.Thread Pool: /src/main/java
/ .parallelism.ThreadPoolTaskExecutor
Parallelism: /src/main/java
/ .parallelism.Parallelism
streamsPerGpu
option isn’t used anymore in the latest CUDA code.
Stream management is handled internally by the CUDA code now based on
hardware specs queried at runtime.Cluster Communication: /src/main/java
/ .parallelism.Cluster