MAHOMES II (Metal Activity Heuristic Of Metal and Enzyme Sites 2) is a structure-based machine learning tool for predicting protein bound metal ions to be enzymatic or non-enzymatic. The MAHOMES II website performs two tasks 1) use an automated feature pipeline to calculate the necessary structure-based features and 2) make enzyme or non-enzyme predictions using MAHOMES II.
This page covers relevant details for usage of the MAHOMES II website. Details about MAHOMES II’s methods can be found in publications (Feehan, Franklin, & Slusky, 2021). MAHOMES II uses a gradient boosting classifier from scikit-learn(Pedregosa et al., 2011), which can be read about at https://scikit-learn.org/stable/modules/ensemble.html#gradient-boosting.
Users will receive an email, when a submitted job has finished running, which will include a link to that job's results. The outputs page will contain a row for each metal site, defined as four or fewer metal ion(s) within 5 Å of each other, or for an input structure that failed the automated feature calculation process.
Site results include (1) the user’s file name for the site’s structure, the residue code and the residue number for the site’s (2,3) first metal ion, (4,5) second metal ion, (6,7) third metal ion, and (8,9) fourth metal ion when relevant.
MAHOMES II used ten different gradient boosting ML models, which use different random seeds for training. Results for each site display (10) the percent of these models that predicted the site to be enzymatic and (11) the final enzyme or non-enzyme prediction. For structures that failed the feature calculation process, an explanation will be given in place of a prediction.
Physicochemical features are calculated covering five categories – Rosetta energy terms, pocket void, pocket lining, electrostatics, and coordination geometry. These calculations use Rosetta (Alford et al., 2017), BLUUES (Fogolari et al., 2012), pdb2pqr (Jurrus et al., 2018), FindGeo (Andreini, Cavallaro, & Lorenzini, 2012), and GHECOM(Kawabata, 2019). Due to the use of these third-party tools, only the final prediction made by MAHOMES II is made accessible to users. However, in the event that the automated feature process fails, feedback is given in the results to tell users what went wrong so that they may attempt to make necessary adjustments. Feedback messages are detailed below.
Invalid input file: Metalloprotein structure files are first input into Rosetta to create a uniform .pdb output for further processing. “Invalid input file” means that Rosetta was unable to use this file. The main reason for this issue is improper PDB format. For other potential causes, scoring the file with Rosetta 3.13 should create a Rosetta crash report that details the issue.
No metal ions in structure: None of the relevant metal ion codes were found in the structure file. Note that MAHOMES II only works for metal ions listed in the submission requirements.
To many metals in the site: Feature calculations did not proceed because this site had more than four metals. Sites include any metal ion within 5 Å of at least one metal ion in the site. We use this limit because of issues with feature calculations like those that calculate distances from the sites geometric center.
Unknown feature calculation failure: Congratulations on finding a new bug. We can be contacted by emailing mahomes@ku.edu to see if this is an issue that we can resolve. Alternatively, the code for the automated feature pipeline and MAHOMES II can be downloaded.
Our performance evaluation of MAHOMES II included a test of metal sites on computationally generated structures. The original computationally generated structures included only the twenty canonical residues. Using coordinating residues identified on UniProt, we placed metals relevant metal binding sidechain atoms. The script we used for metal binding site placement, addMetalIon.py, is available for download with the rest of our MAHOMES II on GitHub. Additionally, the metalloprotein structures and enzyme/non-enzyme labels are available on Zenodo.