"Finding backbone substructures that match an arbitrary query structural motif, composed of multiple disjoint segments, is a problem of growing relevance in structure prediction and protein design" -- Gevorg Grigoryan.

Gevorg's lab built a tool called Method of Accelerated Search for Tertiary Ensemble Representatives (MASTER), which is essentially a partial distance search engine for the Protein Data Bank.

It is a commandline tool which requires the pdb database to be downloaded locally. Gevorg approached me about helping to develop a client-server visualization and querying tool for use with PyMol. The number of structures in the PDB databases are exploding, so a tool that allows a scientist to quickly search and visualize would be invaluable.

I took on building a PyMol module and a server API component as a weekend project to allow his lab at first but then others to be able to use his MASTER search database for building proteins.

I chose Flask as the lightweight framework for the server side component. Because the actual search can take anywhere from a few seconds to a couple minutes I used Redis with pyrq to get background workers to support large numbers of concurrent requests along with gunicorn.

The pymol plugin was trickier, largely because its a giant application with many examples but less documentation. However all it needed to do was present a simple UI, submit a query to the API, and load the resulting protein structure as a group into the view. The pymol module ended up doing threaded long polls against the API that provided partial progress results along the way. Because the MASTER search is linear I could send exact progress to the client.

Screenshot of selecting residuals to search by in PyMol

Screenshot of selecting residuals to search by in PyMol

The lab is extending the functionality from my base code framework and will be releasing it officially to the public, but for now the prerelease codes are here: https://github.com/timofei7/master_protein