CrownBind-IA: A machine learning model predicting binding constants between crown ethers and alkali metal ions

Han-Bin Liu Xiaoyu Cheng Zhou Guo Juan Yang Fuwen Tan Donghui Lan Jian-Ping Tan Bing Yi Weixin Zhai Qing-Hui Guo

Citation:  Han-Bin Liu, Xiaoyu Cheng, Zhou Guo, Juan Yang, Fuwen Tan, Donghui Lan, Jian-Ping Tan, Bing Yi, Weixin Zhai, Qing-Hui Guo. CrownBind-IA: A machine learning model predicting binding constants between crown ethers and alkali metal ions[J]. Chinese Chemical Letters, 2025, 36(12): 111149. doi: 10.1016/j.cclet.2025.111149 shu

CrownBind-IA: A machine learning model predicting binding constants between crown ethers and alkali metal ions

English

  • In recent years, the application of data-driven machine learning methods in various domains of chemistry has been steadily growing. These machine learning models have the capability to navigate intricate nonlinear relationships, unveiling patterns within high-dimensional data, and have demonstrated high accuracy in predicting compound properties [13], as well as reactivity and selectivity in chemical synthesis [48]. The successful development of numerous machine learning models showcases the vast potentials of artificial intelligence, i.e., AI technology, particularly machine learning, in chemistry. In the field of supramolecular chemistry [911], which is based on noncovalent interactions [1215], a pivotal scientific issue is the efficient and quantitative exploration of the binding constants between host and guest molecules. Traditionally, this challenge has been tackled through experimental measurements, which is constrained by efficiency and costs. The characteristics and advantages of AI technology make it as a promising alternative solution. Applying AI technology to solve this issue faces limitations and challenges due to the lack of structured datasets and effective descriptors. Notably, Cronin group presented a AI model based on electron density [16]. This model successfully identified previously unreported guests for well-known host molecules, the cucurbituril CB[6] and the metal–organic cage [Pd2L4]4+, highlighting the promising prospects of applying AI technology in the realm of supramolecular chemistry research. Currently, reports on the application of AI technology in the field of supramolecular chemistry research remain relatively rare.

    Crown ethers, as a classical type of macrocyclic host molecules, laid the foundation of supramolecular chemistry [17,18] and have been applied in extensive fields by far, such as recognition [1719], ion transport [20], the construction of interlocked architectures [21,22] and molecular machines [2326]. Crown ethers demonstrate exceptional performance in host-guest recognition, particularly in the selective recognition of alkali metal ions [19]. The size matching between crown ether cavities and the radius of alkali metal ions is conventionally regarded as the primary determinant of the host-guest interaction magnitude and the manifestation of recognition selectivity [27,28]. The factors affecting the binding constants (logK) between crown ethers and alkali metal ions are multifaceted though. Apart from the relatively subtle differences among alkali metal ions, the flexible structure of crown ethers allows for preorganization during guest recognition, while the functional groups present on the macrocyclic framework modulate the mode and strength of host-guest interactions. The experimental conditions such as temperature and solvents further impact the binding constants between specific host-guest pairs as well. In order to achieve optimal recognition effectiveness and selectivity, supramolecular chemists leveraging their expertise and experience, design macrocyclic host molecules tailored to their objectives and employ experimental measurements to characterize their interactions with guests. Current experimental methods for determining logK are time-consuming and resource-intensive. Characterizing binding constants through physical experimentation can require extensive laboratory work, specialized equipment, and significant expenditure. The ability to accurately predict logK would circumvent the need for such laborious and costly experimental trials, enabling researchers to navigate the vast chemical search space more efficiently and accelerate the discovery and development of novel supramolecular systems with targeted properties. In our study, we focused on the binding constants between crown ethers and alkali metal ions and developed a highly accurate machine learning model that can effectively predict the binding constants between crown ethers and alkali metal ions, i.e., CrownBind-IA. By organizing and cleansing the primary literature data, we compiled a structured dataset on the interactions between crown ethers and alkali metal ions. Based on this structured dataset, suitable descriptors were chosen to represent the host structures, guest structures and the experiment conditions. With suitable algorithms, CrownBind-IA was established to accurately predict the binding constants between hosts and guests (Fig. 1). This approach enables us to efficiently understand the magnitudes of interactions between hosts and guests without the need to conduct all determination experiments.

    Figure 1

    Figure 1.  Machine learning prediction of binding constants between crown ethers and alkali metal ions. Through the organization and cleansing of the data in original literature, we have constructed a structured dataset detailing the interactions between crown ethers and alkali metal ions. By selecting suitable descriptors and appropriate algorithms, CrownBind-IA, a machine learning model which can accurately predict the magnitudes of host-guest binding constants, is developed.

    The application of machine learning to accurately predict molecular interactions calls for a comprehensive dataset of experimental measurements as a foundational prerequisite. Izatt and Reed have meticulously compiled and summarized thermodynamic and kinetic data on the interactions between various types of host molecules and different types of guest molecules [27,28]. These data serve as a valuable foundation for investigation of host-guest interactions, rational design of novel host molecules, and particularly for exploring the applications of machine learning methods in supramolecular chemistry. We specifically selected and reorganized a more comprehensive dataset from the historical documents, focusing on the binding constants between unicyclic crown ethers and alkali metal ions. Each data entry in the dataset contains the structures of the host molecules, radius of the alkali metal ions, the temperature and solvent system used during the measurements, and the corresponding binding constant values. Based on the dataset, we accomplished the model training and achieved the CrownBind-IA.

    We performed data curation and standardization to ensure the dataset is suitable and reliable for machine learning applications (Fig. 2). All the chemical structures of the host molecules and solvents are represented using SMILES [2931] strings for subsequent digital processing. To ensure consistency and compatibility with Python libraries such as RDKit [32], all compounds’ SMILES were converted to canonical SMILES. The guests in the dataset are alkali metal ions (Li+, Na+, K+, Rb+ and Cs+) and their description primarily relies on the ionic radius. Temperature and solvent systems were chosen as experimental variables relevant to determining the binding constants of crown ether-alkali metal ion complexes. After thorough data curation, the final dataset was constructed, comprising 4178 data entries that encompassed 837 different crown ether molecules. The sizes of the macrocyclic frameworks ranged from 9 to 60 atoms, containing heteroatoms or aromatics, such as aza-crown ethers, thio-crown ethers, benzo-fused and heterocycle-fused crown ethers. The temperatures under which binding constants being determined range from 203 K to 373 K. The measurements were performed in 59 types of solvents, including both single-solvent systems and mixed-solvent systems composed of 2–3 solvents. Statistical analysis based on the guest ions revealed that the dataset included 323 data entries related to binding constants with Li+ (7.7%), 1392 data entries related to binding constants with Na+ (33.3%), 1326 data entries related to binding constants with K+ (31.7%), 399 data entries related to binding constants with Rb+ (9.6%), and 741 data entries related to binding constants with Cs+ (17.7%). 95% of the logK locate in the range of 0–8. 64% of the logK is above 3, and 19% of the logK locate in the range of 2 to 3. The median value of the logK dataset is 3.5. The first quartile of the logK dataset is 2.5, and the third quartile is 4.5. This dataset encompasses all alkali metal ions and a diverse range of crown ether structures, solvent systems, and commonly used experimental conditions for measuring binding constants. Based on the established dataset, we evaluated and selected descriptors and machine learning algorithms, proceeded with model training then.

    Figure 2

    Figure 2.  Workflow of the dataset foundation, model training and logK prediction by CrownBind-IA. (a) The data curation from documents. (b) The construction and statistics of the dataset. (c) The screening and optimization of descriptors and algorithms of the model. (d) The establishment of the CrownBind-IA. (e) The application of the CrownBind-IA to predicting the logK of out-of-sample data.

    The dataset was initially divided into a training set (75%) and a test set (25%). To appropriately describe various molecular structures and features, we compared the performance of multiple descriptors using different algorithms (Fig. 3a). We evaluated molecular descriptors based on (ⅰ) molecular physical properties, i.e., Phy, which is the combination of molecular weight, logP and topological polar surface area (TPSA), (ⅱ) 2D topological structure, i.e., molecular fingerprints (MF) [33], which encode molecular structure information such as atom types, bond orders, hybridization states, aromaticity, and spatial arrangements, enabling machine learning models to correlate structural patterns with target properties or activities, and (ⅲ) 3D coordinates [34], i.e., atom-centered symmetry functions (ACSF), smooth overlap of atomic positions (SOAP), local many-body tensor representation (LMBTR) and coulomb matrix (CM). We also examined the performance of various common machine learning algorithms such as XGBoost, Bagging, Support Vector Regression (SVR), Decision Tree, Extra Trees and Random Forest. The test set performances of representative descriptor-algorithm combinations are illustrated in Fig. 3a. Detailed performances are provided in Tables S2 and S3 (Supporting information). The results demonstrate that there are more than one combination of descriptors and algorithms which resulted in model RMSE less than 1 logK units (Phy and MF descriptors combined with XGB algorithm), indicating predictions of logK within an order of magnitude of error. This highlights the effectiveness of machine learning as a powerful tool for predicting the host-guest interactions between crown ether macrocycles and alkali metal ions.

    Figure 3

    Figure 3.  The performance and interpretation of the model. (a) The screening of the combination of descriptors and algorithms. (b) The scatter plot comparing predicted versus experimental binding constants. (c) The contribution of the ten most critical variables in descriptors to the prediction of logK.

    From the preliminary findings, it is evident that the models utilizing Phy and MF descriptors exhibit superior performance compared to models using 3D descriptors. To enhance the accuracy of the model, we explored the combination of different types of descriptors. Notably, when Phy were combined with MF-1 (radius = 1, nBit = 2048), the model demonstrated improved performance, achieving an RMSE of 0.78 logK units, which was lower than that obtained by applying the Phy or MF descriptor only. However, the addition of 3D descriptors (CM, ACSF, SOAP, or LMBTR) individually or collectively did not improve the accuracy of models. This phenomenon indicates that indiscriminately augmenting the model's input parameter information does not necessarily lead to enhanced performance. Drawing inspiration from these findings, we endeavored to diminish the input dimensionality of various descriptors. We reduced the vector dimensionality of MF by adjusting nBit = 2048 to nBit = 64, and combined MF-2 (radius = 1, nBit = 64) with the Phy descriptor. The model's performance gets improved with an RMSE of 0.73 logK units. Subsequently, we relinquished the utilization of global information from the CM. After screening and optimization, we have identified that combining the two largest values from the CM (CM-2) with Phy and MF descriptors demonstrated the best model performance. Regarding algorithm selection, XGBoost consistently achieved the highest model performance in most scenarios. Ultimately, we identified the optimal descriptors for compounds as the combination of Phy and MF-2 descriptors, along with the incorporation of CM-2, utilizing the XGBoost algorithm. Under these conditions, the model CrownBind-IA achieved an RMSE of 0.68 logK units and R2 = 0.82 for predicting the binding constants between crown ethers and alkali metal ions. The scatter plot comparing the predicted and experimental values is presented in Fig. 3b, demonstrating the model's highly accurate predictive performance for the majority of the dataset. During the process of descriptor selection, we observed that reducing the dimensionality of information and eliminating redundant details in the descriptors could enhance the model's performance. In practice, it is not the entire structure of molecules that plays a crucial role in the host-guest interaction, but rather the crucial parts of structures. The patterns observed during the optimization of descriptors selection align with this realistic scenario.

    To determine the contribution of each input variable in predicting binding constants, we conducted an F-score analysis (Fig. 3c). The results indicate that the most critical input variables are the ion radius and the host molecular weight. Other significant variables include the host molecule's logP, TPSA, and the testing temperature. This allows a chemically meaningful interpretation to this result: the intrinsic physicochemical properties of crown ethers and alkali metal ions are the key factors determining the binding constants between crown ethers and alkali metal ions. As for host molecules, the three most critical variables are molecular weight, logP, and TPSA. This implies that molecular size, solvation extent in solvent systems, and the presence of heteroatom-containing functional groups serve as critical determinants for crown ether-alkali metal ion interactions. Compared to other characteristics of the host molecules, the importance of CM and MF is notably lower than that of physical property constants such as logP, TPSA, and molecular weight. This is consistent with our findings during descriptor screening. For variables with lower contributions, such as CM and MF, reducing their eliminating redundant details can enhance model performance.

    To assess the CrownBind-IA's ability for out-of-sample predictions and evaluate its application value, we selected two representative scenarios to showcase the model's performance. The first scenario involves predicting the binding constants between crown ethers and alkali metal ions under the solvent system not included in the dataset (Fig. 4). Chiu's group determined the binding constants of 18-crown-6 (18C6), the most classic crown ether molecule, with Na+ and K+ ions in a solvent system composed of acetonitrile and dichloromethane in 2016 [35]. This mixed solvent system is not part of the dataset. The CrownBind-IA's predicted results closely align with the experimental measurements, exhibiting a logK error of less than 1. The second scenario entails predicting the binding constants between crown ethers, which is not included in the dataset, and alkali metal ions. This series of crown ethers CE1-CE6 differ in the cavity of the macrocycles or the substituents’ structures [36,37]. Without experimental measurements, accurately determining the binding constants with alkali metal ions for these compounds solely based on domain knowledge is almost impossible. However, CrownBind-IA can predict the binding constants of nearly all compounds with alkali metal ions with exceptional precision. For the 15 out of 16 sets (93.7%) of external data reported in the literature above, the CrownBind-IA model provides logK prediction values with an error of less than 1, compared to the experimental values. Detailed prediction results are provided in Table S8 (Supporting information). The model successfully achieves accurate predictions of the binding constants between crown ethers and alkali metal ions in both scenarios, showcasing its exceptional extrapolative ability.

    Figure 4

    Figure 4.  The prediction of out-of-sample data. (a) The structure of out-of-sample crown ethers. (b) The prediction performance of target crown ethers and Na+. The blue dots are the logK measured experimentally, the orange triangles are the predicted logK by the CrownBind-IA. The dash lines show the error range of less than 1 unit of logK. (c) The prediction performance of target crown ethers and K+. The blue blocks are the logK measured experimentally, the orange stars are the predicted logK by the CrownBind-IA. The dash lines show the error range of less than 1 unit of logK.

    In conclusion, we have developed CrownBind-IA, a model based on machine learning methods to accurately predict the binding constants between crown ethers and alkali metal ions. This model combines multiple types of descriptors with appropriate information dimensions and achieves an impressive RMSE of only 0.68 logK units. By demonstrating its ability to predict out-of-sample data, CrownBind-IA showcases excellent extrapolative capabilities and practical application potentials. This highly accurate machine learning model can play a crucial role in empowering supramolecular chemists with enhanced rational design capabilities for functional host molecules, holding paramount importance in improving experimental efficiency and reducing experimental costs as well. The establishment of this model highlights the broad prospects of AI technology, particularly machine learning, in addressing key challenges in the field of supramolecular chemistry. Ongoing efforts are dedicated to developing more comprehensive and versatile AI models combining principles of supramolecular chemistry.

    The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

    Han-Bin Liu: Writing – review & editing, Writing – original draft, Methodology, Funding acquisition, Data curation, Conceptualization. Xiaoyu Cheng: Software, Methodology, Data curation. Zhou Guo: Software, Methodology, Data curation. Juan Yang: Data curation. Fuwen Tan: Data curation. Donghui Lan: Data curation. Jian-Ping Tan: Data curation. Bing Yi: Writing – review & editing, Supervision, Conceptualization. Weixin Zhai: Writing – review & editing, Writing – original draft, Methodology, Funding acquisition, Conceptualization. Qing-Hui Guo: Writing – review & editing, Project administration, Methodology, Funding acquisition, Conceptualization.

    Qing-Hui Guo acknowledges the financial support of the National Natural Science Foundation of China (Nos. 22193020 and 22193022) and Tsinghua University Initiative Scientific Research Program, Weixin Zhai acknowledges the financial support of the National Natural Science Foundation of China (No. 32301691). Donghui Lan acknowledges the financial support of the Science and Technology Innovation Program of Hunan Province (No. 2023RC3188). Jian-Ping Tan acknowledges the financial support of the Science and Technology Innovation Program of Hunan Province (No. 2022RC1112), the Elite Youth Program by the Department of Education of Hunan Province (No. 21B0666). Han-Bin Liu acknowledges the financial support of the Scientific Research Foundation of Hunan Provincial Education Department (No. 24C0380).

    Supplementary material associated with this article can be found, in the online version, at doi:10.1016/j.cclet.2025.111149.


    1. [1]

      Z. Wu, B. Ramsundar, E.N. Feinberg, et al., Chem. Sci. 9 (2018) 513–530.

    2. [2]

      Q. Yang, Y. Li, J.D. Yang, et al., Angew. Chem. Int. Ed. 59 (2020) 19282–19291. doi: 10.1002/anie.202008528

    3. [3]

      L.M. Sigmund, S.S. Shree, A. Albers, et al., Angew. Chem. Int. Ed. 63 (2024) e202401084.

    4. [4]

      A.F. Zahrt, J.J. Henle, B.T. Rose, et al., Science 363 (2019) eaau5631.

    5. [5]

      L.C. Xu, S.Q. Zhang, X. Li, et al., Angew. Chem. Int. Ed. 60 (2021) 22804–22811. doi: 10.1002/anie.202106880

    6. [6]

      S.W. Li, L.C. Xu, C. Zhang, et al., Nat. Commun. 14 (2023) 3569–3580.

    7. [7]

      Z.J. Zhang, S.W. Li, J.C.A. Oliveira, et al., Nat. Commun. 14 (2023) 3149–4157.

    8. [8]

      X. Hong, Q. Yang, K. Liao, et al., Sci. China Chem. 67 (2024) 2461–2496. doi: 10.1007/s11426-024-2072-4

    9. [9]

      D.J. Cram, Angew. Chem. Int. Ed. 27 (1988) 1009–1020.

    10. [10]

      C.J. Pedersen, Angew. Chem. Int. Ed. 27 (1988) 1021–1027.

    11. [11]

      J.M. Lehn, Angew. Chem. Int. Ed. 27 (1988) 89–112.

    12. [12]

      D.X. Wang, Q.Y. Zheng, Q.Q. Wang, et al., Angew. Chem. Int. Ed. 47 (2008) 7485–7488. doi: 10.1002/anie.200801705

    13. [13]

      D.E. Koshland, Angew. Chem. Int. Ed. 33 (2003) 2375–2378.

    14. [14]

      I. Goldberg, J. Am. Chem. Soc. 102 (1980) 4106–4113. doi: 10.1021/ja00532a021

    15. [15]

      G.W. Gokel, L.J. Barbour, R. Ferdani, J. Hu, Acc. Chem. Res. 35 (2002) 878–886.

    16. [16]

      J.M. Parrilla-Gutiérrez, J.M. Granda, J.F. Ayme, et al., Nat. Comput. Sci. 4 (2024) 200–209. doi: 10.1038/s43588-024-00602-x

    17. [17]

      C.J. Pedersen, J. Am. Chem. Soc. 89 (1967) 7017–7036. doi: 10.1021/ja01002a035

    18. [18]

      C.J. Pedersen, J. Am. Chem. Soc. 89 (1967) 2495–2496. doi: 10.1021/ja00986a052

    19. [19]

      J.S. Bradshaw, R.M. Izatt, Acc. Chem. Res. 30 (1997) 338–345.

    20. [20]

      S. Matile, A.V. Jentzsch, J. Montenegro, A. Fin, Chem. Soc. Rev. 40 (2011) 2453–2474. doi: 10.1039/c0cs00209g

    21. [21]

      C.A. Schalley, K. Biezai, F. Vögtle, Acc. Chem. Res. 34 (2001) 465–476.

    22. [22]

      F.M. Raymo, J.F. Stoddart, Chem. Rev. 99 (1999) 1643–1663.

    23. [23]

      J.P. Sauvage, Acc. Chem. Res. 31 (1998) 611–619.

    24. [24]

      D. Philp, J.F. Stoddart, Angew. Chem. Int. Ed. 35 (2003) 1154–1196.

    25. [25]

      J.D. Badjić, V. Balzani, A. Credi, S. Silvi, J.F. Stoddart, Science 303 (2004) 1845–1849.

    26. [26]

      S. Erbas-Cakmak, D.A. Leigh, C.T. McTernan, A.L. Nussbaumer, Chem. Rev. 115 (2015) 10081–10206. doi: 10.1021/acs.chemrev.5b00146

    27. [27]

      R.M. Izatt, J.S. Bradshaw, S.A. Nielsen, et al., Chem. Rev. 85 (1985) 271–339. doi: 10.1021/cr00068a003

    28. [28]

      R.M. Izatt, K. Pawlak, J.S. Bradshaw, R.L. Bruening, Chem. Rev. 91 (1991) 1721–2085. doi: 10.1021/cr00008a003

    29. [29]

      D. Weininger, J. Chem. Inf. Comput. Sci. 28 (1988) 31–36. doi: 10.1021/ci00057a005

    30. [30]

      D. Weininger, A. Weininger, J.L. Weininger, J. Chem. Inf. Comput. Sci. 29 (1989) 97–101. doi: 10.1021/ci00062a008

    31. [31]

      D. Weininger, J. Chem. Inf. Comput. Sci. 30 (1990) 237–243. doi: 10.1021/ci00067a005

    32. [32]

      RDKit, http://www.rdkit.org.

    33. [33]

      D. Rogers, M. Hahn, J. Chem. Inf. Model. 50 (2010) 742–754. doi: 10.1021/ci100050t

    34. [34]

      L. Himanen, M.O.J. Jäger, E.V. Morooka, et al., Comput. Phys. Commun. 247 (2020) 106949 -104960.

    35. [35]

      Y.J. Lee, T.H. Ho, C.C. Lai, S.H. Chiu, Biomol. Chem. 14 (2016) 1153–1160.

    36. [36]

      X.X. Zhang, A.V. Bordunov, J.S. Bradshaw, et al., J. Am. Chem. Soc. 117 (1995) 11507–11511. doi: 10.1021/ja00151a014

    37. [37]

      A.V. Bordunov, J.S. Bradshaw, X.X. Zhang, et al., Inorg. Chem. 35 (1996) 7229–7240.

  • Figure 1  Machine learning prediction of binding constants between crown ethers and alkali metal ions. Through the organization and cleansing of the data in original literature, we have constructed a structured dataset detailing the interactions between crown ethers and alkali metal ions. By selecting suitable descriptors and appropriate algorithms, CrownBind-IA, a machine learning model which can accurately predict the magnitudes of host-guest binding constants, is developed.

    Figure 2  Workflow of the dataset foundation, model training and logK prediction by CrownBind-IA. (a) The data curation from documents. (b) The construction and statistics of the dataset. (c) The screening and optimization of descriptors and algorithms of the model. (d) The establishment of the CrownBind-IA. (e) The application of the CrownBind-IA to predicting the logK of out-of-sample data.

    Figure 3  The performance and interpretation of the model. (a) The screening of the combination of descriptors and algorithms. (b) The scatter plot comparing predicted versus experimental binding constants. (c) The contribution of the ten most critical variables in descriptors to the prediction of logK.

    Figure 4  The prediction of out-of-sample data. (a) The structure of out-of-sample crown ethers. (b) The prediction performance of target crown ethers and Na+. The blue dots are the logK measured experimentally, the orange triangles are the predicted logK by the CrownBind-IA. The dash lines show the error range of less than 1 unit of logK. (c) The prediction performance of target crown ethers and K+. The blue blocks are the logK measured experimentally, the orange stars are the predicted logK by the CrownBind-IA. The dash lines show the error range of less than 1 unit of logK.

  • 加载中
计量
  • PDF下载量:  0
  • 文章访问数:  19
  • HTML全文浏览量:  5
文章相关
  • 发布日期:  2025-12-15
  • 收稿日期:  2025-01-08
  • 接受日期:  2025-03-26
  • 修回日期:  2025-03-12
  • 网络出版日期:  2025-03-26
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

/

返回文章