
https://academic.oup.com/bib/article/26/5/bbaf488/8259888
Accurate structure prediction of cyclic peptides containing unnatural amino acids using HighFold3
Key figures
- Figure 1: Explains the Cyclization Switch that routes linear versus cyclic peptide inputs into AlphaFold 3 using distinct positional matrices.
- Figure 2: Shows how HighFold3 handles disulfide-bond pairing and multi-cyclic-peptide ligand complexes with a shared protein receptor.
- Figure 3: Quantifies cyclic peptide monomer accuracy across HighFold, HighFold3, CABS-flex, ESMFold, and HelixFold, including disulfide-bridge cases and pLDDT correlations.
- Figure 4: Shows cyclic peptide-protein complex prediction accuracy, external benchmarking against CyclicBoltz1, and ZDOCK redocking validation.
- Figure 5: Quantifies improved modeling of unAA-containing cyclic peptides using RMSDCα, RMSDall-atom, RMSDunAA, and pLDDT correlations.
- Figure 6: Demonstrates that removing CycPOEM degrades cyclic peptide complex prediction and can make a cyclic peptide appear linear.
- Figure 7 and Table 1: Shows that AlphaFold 3 failed to enforce head-to-tail closure for an unseen cyclic peptide, whereas HighFold3-Cyclic achieved 100% cyclization success versus AlphaFold 3’s 21%.
1) Thesis (one sentence)
To address inaccurate modeling of cyclic peptides containing unnatural amino acids, in AlphaFold 3-based peptide monomer and peptide-protein complex structure prediction, HighFold3 causes improved topology control and coordinate accuracy by selecting linear or cyclic positional encodings through a Cyclization Switch, encoding cyclic closure with CycPOEM, handling disulfide-bond topologies, and defining unAAs through CCD, supported by RMSD benchmarking, pLDDT correlation, ablation, docking, and cyclization-success analyses.
2) Evidence card (three bullets only)
- Strongest result: (Fig. 3A–F; Fig. 4A–B; Fig. 5A–C; Table 1) HighFold3 achieved cyclic monomer median/average RMSDCα of 0.918/1.338 Å, improved internal cyclic peptide complex median/average RMSDCα to 0.266/0.305 Å, outperformed CyclicBoltz1 in 14/15 external complexes, improved unAA cyclic peptide RMSDCα, RMSDall-atom, and RMSDunAA versus HighFold2, and reached 100% head-to-tail cyclization success on 100 external cyclic peptide sequences.
- Method enabler: (Fig. 1; Fig. 2; computational structure-prediction model engineering + AlphaFold 3 + CycPOEM/FCP + disulfide-bond combination matrix + CCD + ZDOCK) HighFold3 keeps AlphaFold 3 pretrained parameters frozen, uses a Boolean “head_to_tail” switch to select HighFold3-Linear or HighFold3-Cyclic, computes cyclic shortest-path residue distances with an improved Floyd-Warshall algorithm, supports disulfide-bond topology enumeration, and validates representative complexes by ZDOCK 3.0.2 rigid-body FFT docking.
- Critical limitation: (Fig. 7; Table 1; Conclusion) The topology constraint can force a closed macrocycle and fix AlphaFold 3’s >20 Å N-to-C failure, but the output remains a static Top-1 structure that does not model physiological conformational dynamics, binding energetics, or experimentally validate predicted peptide-target contacts.
Optional
Quote bank (2–4 short excerpts)
- Quote 1: “For cyclic peptides, CycPOEM directly connects the N-terminus and C-terminus of the peptide chain, constructing a closed-loop distance matrix.” (HighFold3 framework / Fig. 1C, p. 3)
- Quote 2: “static structure modeling” (Conclusion, p. 10)
Key comparisons (1–3 lines)
- Compared to: HighFold, HighFold2, CyclicBoltz1, NCPepFold, CABS-flex, ESMFold, HelixFold, and baseline AlphaFold 3.
- Win: Lower RMSD across cyclic monomers, cyclic peptide-protein complexes, unAA-containing cyclic peptides, and linear unAA peptides; 100% external head-to-tail cyclization success versus AlphaFold 3’s 21%.
- Tradeoff: Strong topology control improves closure and benchmarking metrics, but does not prove dynamic binding mechanisms, physiological-state ensembles, or experimental correctness of all side-chain/interface details.
Methods I might copy (protocol hooks)
- Construct design / Models: Use two explicit modes, HighFold3-Linear and HighFold3-Cyclic, toggled by the Boolean “head_to_tail”; for linear peptides, set adjacent residue interval to 1 and N-to-C distance to peptide length minus 1; for cyclic peptides, directly connect N- and C-termini in CycPOEM; for cyclic peptide-protein complexes, combine a linear positional matrix for the protein with CycPOEM for the cyclic peptide ligand; encode unAAs through CCD.
- Conditions / Instruments: Run on Ubuntu 20.04.2 with Intel Xeon Gold 6430 CPU, 64 cores, four NVIDIA A800 80 GB GPUs, 500 GB RAM, and 40 TB storage; evaluate Top-1/highest-confidence predictions; reported random seed values were 1, 4, 16, and 64 for cyclic peptide monomer stability testing.
- Readout / Analysis: Calculate RMSDCα, RMSDall-atom, and RMSDunAA using Kabsch rigid superposition on Cα coordinates; define complex chains under 40 residues as peptide chains; extract unAA atomic coordinates within 10 Å for local superposition; report pLDDT-RMSD Pearson correlations, Wilcoxon signed-rank tests, cyclization success rate, and ZDOCK redocking concordance.
Open questions / Theoretical implications (2–5 bullets)
- Can CycPOEM-style topology encoding generalize from head-to-tail and disulfide-cyclized peptides to stapled peptides, side-chain-to-tail cyclization, mixed macrocycles, or covalent warhead-bearing peptide ligands?
- Does improved static RMSD predict usable binding free energy or target engagement when unAA side chains dominate the peptide-protein interface?
- Should confidence thresholds be separately calibrated for backbone topology, unAA side-chain placement, and protein-peptide interface geometry?
- Can MD refinement identify cases where HighFold3 closes the macrocycle correctly but misrepresents conformational heterogeneity or ligand-receptor residence states?
- How much of the improvement comes from explicit topology constraints versus AlphaFold 3’s learned structural priors and CCD-based chemical definitions?