
https://onlinelibrary.wiley.com/doi/10.1002/pro.2610
Rapid search for tertiary fragments reveals protein sequence–structure relationships
Key figures
- Figure 1: Explains the core MASTER algorithm, showing how partial alignments constrain the allowed locations of remaining segments through exact RMSD and distance bounds.
- Figure 4: Demonstrates a practically important use case for design, namely mapping a native designability landscape and sequence preferences for a small tertiary motif.
- Figure 7: Shows that MASTER can be used not just for analysis but for automated topology remodeling by identifying native connector lengths and geometries.
1) Thesis (one sentence)
To address the gap of rapidly finding exact matches to arbitrary disjoint backbone motifs, in Protein Data Bank protein structures, the MASTER algorithm causes fast exhaustive retrieval of tertiary fragment matches by combining RMSD-bounded branch-and-bound search with exact intersegment distance constraints, supported by performance benchmarking, designability mapping, functional motif mining, and topology-remodeling examples.
2) Evidence card (three bullets only)
- Strongest result: (Fig. 3, Table II, Table III) MASTER searched a nonredundant 12,661-structure database using 50 diverse motifs and usually completed in seconds, with even many multisegment queries remaining practical, and limiting output to the top 1000 matches gave up to order-of-magnitude additional speedups for difficult cases.
- Method enabler: (Fig. 1, Table I; computational structural biology + exact search + RMSD geometry) The key enabling idea is a provably correct atomistic tertiary fragment search that sorts query segments by size, brute-forces the first segment, then recursively prunes remaining possibilities using exact per-segment RMSD bounds, cumulative RMSD bounds, and exact intersegment distance bounds around central residues.
- Critical limitation: (Fig. 3, Table II, Table IV) The method remains fundamentally exponential for loose RMSD cutoffs that produce very many solutions, especially for four- or five-segment queries, and greedy heuristics only modestly improve speed while missing real matches, indicating the problem still lacks a strongly exploitable optimal substructure.
Optional
Quote bank (2–4 short excerpts)
- Quote 1: “MASTER is both rapid” (Abstract, page 1)
- Quote 2: “finding all matches below a user-specified root-mean-square deviation cutoff” (Abstract, page 1)
- Quote 3: “the search slows down substantially only when the specified RMSD cutoff corresponds to a very large number of matches” (Results, page 5)
- Quote 4: “it is still difficult and does not have a particularly optimal substructure” (Discussion, page 13)
Key comparisons (1–3 lines)
- Compared to: The authors’ earlier MaDCaT distance-map search and greedy heuristic ATFS approaches.
- Win: MASTER uses backbone RMSD, finds all matches provably below cutoff, and was on average 34-fold faster than MaDCaT for the top 1000 matches.
- Tradeoff: Performance is excellent in realistic regimes but can still degrade when loose cutoffs create huge solution spaces, and heuristic pruning sacrifices coverage.
Methods I might copy (protocol hooks)
- Construct design / Models: Queries were atomistic tertiary fragments with 1 to 5 disjoint segments; the benchmark used 50 motifs spanning 6 to 50 residues and varied helix, strand, and noncanonical combinations; designability analysis parameterized an \alpha/\beta motif by \Delta R, \Delta \phi, and \Delta Z.
- Conditions / Instruments: Searches used a nonredundant nrPDB30 database of 12,661 protein structures built from BLASTClust at 30% sequence identity; RMSD cutoffs tested were 0.4, 0.6, 0.8, 1.0, 1.5, and 2.0 Å; runs were done on a single 2.7 GHz Intel Xeon processor; the implementation searched by Cα RMSD with disk-based PDS files.
- Readout / Analysis: Output metrics included total matches, wall time, speedup versus MaDCaT, recovery ratio under heuristics, AUROC for PDZ and Bcl-2 motif classification, and sequence logos or loop-length histograms extracted from close matches.
Open questions / Theoretical implications (2–5 bullets)
- Can ATFS be pushed from motif search into routine motif-conditioned design loops where candidate geometries are generated and re-ranked on the fly?
- How well do native tertiary-fragment statistics transfer from generic PDB structure space to ligand-engaged or induced-proximity-specific interfaces?
- Could exact tertiary fragment mining around cryptic-pocket-adjacent surfaces reveal sequence-structure rules for naturally designable neomorphic interfaces?
- When does geometric plausibility from native matches correlate strongly enough with energetic realizability to guide interface remodeling without an explicit physics-based refinement step?