About - SURFACE-Bind Database
Overview of the Database
The SURFACE-bind Database is a comprehensive resource offering researchers detailed insights into binding sites on human surface proteins. It includes high-quality seeds—small protein fragments that can bind to these sites—designed to facilitate de novo protein binder design studies. This database serves as a valuable tool for the scientific community, enabling the design and exploration of novel mini-protein-based therapeutics targeting human surface proteins.
Focused on advancing the field of protein design, the database provides users with a diverse array of resources, including carefully crafted seeds, detailed score files, interactive 3D representations of protein-ligand complexes, and comprehensive information on the chemical properties of the binding interfaces.
Data Description
The database comprises static HTML pages for each entry, corresponding to a specific UniProt ID. Each page contains detailed information on the protein structure, predicted binding interfaces, detected seeds, and several visual plots showcasing the distribution of chemical properties for both the binding interfaces and the corresponding seeds. With over 2,500 entries from 2,886 evaluated surface proteins, the database provides in-depth data for each protein entry.
The current size of the database is approximately 50GB, and it continues to grow over time. Each entry ranges in size from 100 to 300MB, with all associated files available for download in zip format at the bottom of each entry page.
Methodology
The binding interfaces are predicted using the MaSIF (Molecular Surface Interaction Fingerprinting) method, which analyzes protein molecular surfaces to detect interaction sites [Ref1].
To identify high-quality seeds for each target interface, we applied the MaSIF-seed-Search algorithm, which enables efficient de novo design of binding fragments [Ref2].
The Surface-Bind computational workflow clusters the predicted binding interfaces, calculates chemical properties for each detected site, and selects high-quality seeds for further protein binder design.
[Ref1] Gainza, P., Sverrisson, F., Monti, F. et al. Deciphering interaction fingerprints from protein molecular surfaces using geometric deep learning. Nat Methods 17, 184–192 (2020).
[Ref2] Gainza, P., Wehrle, S., Van Hall-Beauvais, A. et al. De novo design of protein interactions with learned surface fingerprints. Nature 617, 176–184 (2023).
How to Use the Database
Users can easily browse or search the database through the provided interface. All data entries are accessible based on their UniProt ID. Also entries are classified into sub-families and main class or main family of proteins making it easier to find a specific entry.
If you use this database in your research, please cite it as follows: To be added.
Funding and Contributors
This project was a collaboration between EPFL, Novo Nordisk Pharmaceutical Company, and Inria Centre at Université de Lorraine.
Ethics and Data Sharing
We take data integrity seriously, and every effort has been made to ensure that the data presented is accurate and complete. Users are encouraged to contact us with any concerns or questions regarding data accuracy.
The database is open-access, with all data shared under the Creative Commons license, encouraging broad usage and collaboration.
Contact Information
If you have any questions, comments, or requests for further information, please contact us at hamed.khakzad@inria.fr and bruno.correia@epfl.ch.
Future Directions
We are committed to continuously expanding and improving the database. Future updates will include adding data from other species, implementing a search interface to streamline entry discovery, and enhancing data visualizations for a more interactive user experience.
Reference
To be added.