Citation: Xinghai Li,  Zhisen Wu,  Lijing Zhang,  Shengyang Tao. Machine Learning Enables the Prediction of Amide Bond Synthesis Based on Small Datasets[J]. Acta Physico-Chimica Sinica, ;2025, 41(2): 100010. doi: 10.3866/PKU.WHXB202309041 shu

Machine Learning Enables the Prediction of Amide Bond Synthesis Based on Small Datasets

  • Corresponding author: Lijing Zhang,  Shengyang Tao, 
  • Received Date: 27 September 2023
    Revised Date: 29 November 2023
    Accepted Date: 30 November 2023

    Fund Project: The project was supported by the National Natural Science Foundation of China (22072011, 22372025, 22211530456), the Fundamental Research Funds for the Central Universities (DUT22LAB607, DUT22QN226), and Project 1912 Funds provided by the Chinese Aeronautical Establishment.

  • Machine learning (ML) is progressively revealing notable advantages in chemical synthesis. However, the limited output of experimental data from traditional methods poses a bottleneck, impeding the widespread adoption of machine learning. Data from literature often leads to overly optimistic predictions, and obtaining thousands of experimental data points through experiments remains a substantial challenge. Using a small dataset of experimental data, we illustrated that machine learning algorithms can reliably predict the conversion rate of amide bond synthesis. We gathered hundreds of experimental data points for 9 aromatic amines and 12 organic acids using various coupling reagents and solvents in a 96-well plate high-throughput experimental setup. Subsequently, we derived 76 feature molecular descriptors from quantum chemical calculations and utilized them as inputs for training the machine learning model. Despite the inherent limitation of low data volume, the random forest algorithm demonstrated outstanding predictive performance (R2 > 0.95). Through comprehensive analysis of the reaction process employing importance analysis, shapley additive explanations (SHAP), and accumulated local effects (ALE) methods, we delved into the important factors influencing the reaction conversion rate. In predicting the conversion rate of unknown aromatic amine molecules, we discovered that incorporating a small amount of unknown molecule-related reaction data into the training set effectively enhances the model’s predictive performance, even with a small dataset. By comparing models trained on different molecular descriptors such as density functional theory (DFT) and one-hot encoding, we validated the efficacy of adjusting the training set to improve prediction results. This study utilized a multitude of chemically meaningful feature descriptors and achieved more effective prediction results through multidimensional data analysis, offering valuable insights for machine learning-assisted chemical synthesis research in small datasets. In the near future, machine learning is poised to drive the intelligent development of organic chemistry.
  • 加载中
    1. [1]

      (1) Jordan, M. I.; Mitchell, T. M. Science 2015, 349, 255. doi: 10.1126/science.aaa8415

    2. [2]

      (2) Young, T.; Hazarika, D.; Poria, S.; Cambria, E. IEEE Comput. Intell. Mag. 2018, 13, 55. doi: 10.1109/mci.2018.2840738

    3. [3]

      (3) Myszczynska, M. A.; Ojamies, P. N.; Lacoste, A. M. B.; Neil, D.; Saffari, A.; Mead, R.; Hautbergue, G. M.; Holbrook, J. D.; Ferraiuolo, L. Nat. Rev. Neurol. 2020, 16, 440. doi: 10.1038/s41582-020-0377-8

    4. [4]

      (4) Ranjan, R.; Sankaranarayanan, S.; Bansal, A.; Bodla, N.; Chen, J. C.; Patel, V. M.; Castillo, C. D.; Chellappa, R. IEEE Signal Process. Mag. 2018, 35, 66. doi: 10.1109/msp.2017.2764116

    5. [5]

      (5) Segler, M. H. S.; Waller, M. P. Chem.-Eur. J. 2017, 23, 5966. doi: 10.1002/chem.201605499

    6. [6]

      (6) Shen, Y.; Borowski, J. E.; Hardy, M. A.; Sarpong, R.; Doyle, A. G.; Cernak, T. Nat. Rev. Method. Prim. 2021, 1, 1. doi: 10.1038/s43586-021-00022-5

    7. [7]

      (7) Brockherde, F.; Vogt, L.; Li, L.; Tuckerman, M. E.; Burke, K.; Müller, K. R. Nat. Commun. 2017, 8, 1. doi: 10.1038/s41467-017-00839-3

    8. [8]

      (8) Dara, S.; Dhamercherla, S.; Jadav, S. S.; Babu, C. M.; Ahsan, M. J. Artif. Intell. Rev. 2022, 55, 1947. doi: 10.1007/s10462-021-10058-4

    9. [9]

      (9) Ahneman, D. T.; Estrada, J. G.; Lin, S. S.; Dreher, S. D.; Doyle, A. G. Science 2018, 360, 186. doi: 10.1126/science.aar5169

    10. [10]

      (10) Raccuglia, P.; Elbert, K. C.; Adler, P. D.; Falk, C.; Wenny, M. B.; Mollo, A.; Zeller, M.; Friedler, S. A.; Schrier, J.; Norquist, A. J. Nature 2016, 533, 73. doi: 10.1038/nature17439

    11. [11]

      (11) Roszak, R.; Beker, W.; Molga, K.; Grzybowski, B. A. J. Am. Chem. Soc. 2019, 141, 17142. doi: 10.1021/jacs.9b05895

    12. [12]

      (12) Gao, H.; Struble, T. J.; Coley, C. W.; Wang, Y.; Green, W. H.; Jensen, K. F. ACS Central Sci. 2018, 4, 1465. doi: 10.1021/acscentsci.8b00357

    13. [13]

      (13) Zahrt, A. F.; Henle, J. J.; Rose, B. T.; Wang, Y.; Darrow, W. T.; Denmark, S. E. Science 2019, 363, 1. doi: 10.1126/science.aau5631

    14. [14]

      (14) Reid, J. P.; Sigman, M. S. Nature 2019, 571, 343. doi: 10.1038/s41586-019-1384-z

    15. [15]

      (15) Segler, M. H. S.; Preuss, M.; Waller, M. P. Nature 2018, 555, 604. doi: 10.1038/nature25978

    16. [16]

      (16) Coley, C. W.; Thomas, D. A.; Lummiss, J. A. M.; Jaworski, J. N.; Breen, C. P.; Schultz, V.; Hart, T.; Fishman, J. S.; Rogers, L,; Gao, H.; et al. Science 2019, 365, 1. doi: 10.1126/science.aax1566

    17. [17]

      (17) Santanilla, A. B.; Regalado, E. L.; Pereira, T.; Shevlin, M.; Bateman, K.; Campeau, L. C.; Schneeweis, J.; Berritt, S.; Shi, Z. C.; Nantermet, P.; et al. Science 2015, 347, 49. doi: 10.1126/science.1259203

    18. [18]

      (18) Krska, S. W.; DiRocco, D. A.; Dreher, S. D.; Shevlin, M. Accounts Chem. Res. 2017, 50, 2976. doi: 10.1021/acs.accounts.7b00428

    19. [19]

      (19) Mennen, S. M.; Alhambra, C.; Allen, C. L.; Barberis, M.; Berritt, S.; Brandt, T. A.; Campbell, A. D.; Castañón, J.; Cherney, A. H.; Christensen, M.; et al. Org. Process Res. Dev. 2019, 23, 1213. doi: 10.1021/acs.oprd.9b00140

    20. [20]

      (20) Seefried, F.; Schmidt, T.; Reinecke, M.; Heinzlmeir, S.; Kuster, B.; Wilhelm, M. J. Proteome Res. 2019, 18, 1486. doi: 10.1021/acs.jproteome.8b00724

    21. [21]

      (21) Figueiredo, R. M.; Suppo, J. S.; Campagne, J. M. Chem. Rev. 2016, 116, 12029. doi: 10.1021/acs.chemrev.6b00237

    22. [22]

      (22) Roughley, S. D.; Jordan, A. M. J. Med. Chem. 2011, 54, 3451. doi: 10.1021/jm200187y

    23. [23]

      (23) Sabatini, M. T.; Boulton, L. T.; Sneddon, H. F.; Sheppard, T. D. Nat. Catal. 2019, 2, 10. doi: 10.1038/s41929-018-0211-5

    24. [24]

      (24) Brown, D. G.; Bostrom, J. J. Med. Chem. 2016, 59, 4443. doi: 10.1021/acs.jmedchem.5b01409

    25. [25]

      (25) Halford, B. ACS Central Sci. 2022, 8, 405. doi: 10.1021/acscentsci.2c00369

    26. [26]

      (26) Syed, Y. Y. Drugs 2022, 82, 455. doi: 10.1007/s40265-022-01684-5

    27. [27]

      (27) Ghosh, S. C.; Ngiam, J. S.; Seayad, A. M.; Tuan, D. T.; Chai, C. L. L.; Chen, A. J. Org. Chem. 2012, 77, 8007. doi: 10.1021/jo301252c

    28. [28]

      (28) Pattabiraman, V. R.; Bode, J. W. Nature 2011, 480, 471. doi: 10.1038/nature10702

    29. [29]

      (29) Beker, W.; Gajewska, E. P.; Badowski, T.; Grzybowski, B. A. Angew. Chem.-Int. Edit. 2019, 58, 4515. doi: 10.1002/anie.201806920

    30. [30]

      (30) Aydogdu, S.; Hatipoglu, A. J. Indian Chem. Soc. 2022, 99, 100752. doi: 10.1016/j.jics.2022.100752

    31. [31]

      (31) Ma, Y.; Zhang, X.; Zhu, L.; Feng, X.; Kowah, J. A. H.; Jiang, J.; Wang, L.; Jiang, L.; Liu, X. Molecules 2023, 28, 5995. doi: 10.3390/molecules28165995

    32. [32]

      (32) Ramakrishnan, R.; Dral, P. O.; Rupp, M.; Lilienfeld, O. A. V. Sci. Data 2014, 1, 140022. doi: 10.1038/sdata.2014.22

    33. [33]

      (33) Tsubaki, M.; Mizoguchi, T. J. Phys. Chem. Lett. 2018, 9, 5733. doi: 10.1021/acs.jpclett.8b01837

    34. [34]

      (34) https://github.com/doylelab/rxnpredict (accessed Dec. 28, 2023)

    35. [35]

      (35) Yousef, W. A. Pattern Recognit. Lett. 2021, 146, 115. doi: 10.1016/j.patrec.2021.02.022

    36. [36]

      (36) Dodge, Y. The Concise Encyclopedia of Statistics; Springer New York: New York, NY, USA, 2008; pp. 88–91.

    37. [37]

      (37) Zollanvari, A.; Dougherty, E. R. Pattern Recognit. 2014, 47, 2178. doi: 10.1016/j.patcog.2013.11.022

    38. [38]

      (38) Song, W.; Dong, K.; Li, M. Org. Lett. 2020, 22, 371. doi: 10.1021/acs.orglett.9b03905

    39. [39]

      (39) Mali, S. M.; Bhaisare, R. D.; Gopi, H. N. J. Org. Chem. 2013, 78, 5550. doi: 10.1021/jo400701v

    40. [40]

      (40) Chen, Z.; Fu, R.; Chai, W.; Zheng, H.; Sun, L.; Lu, Q.; Yuan, R. Tetrahedron 2014, 70, 2237. doi: 10.1016/j.tet.2014.02.042

    41. [41]

      (41) Li, X.; Li, Z.; Deng, H.; Deng, H.; Zhou, X. Tetrahedron Lett. 2013, 54, 2212. doi: 10.1016/j.tetlet.2013.02.058

  • 加载中
    1. [1]

      Jiali CHENGuoxiang ZHAOYayu YANWanting XIAQiaohong LIJian ZHANG . Machine learning exploring the adsorption of electronic gases on zeolite molecular sieves. Chinese Journal of Inorganic Chemistry, 2025, 41(1): 155-164. doi: 10.11862/CJIC.20240408

    2. [2]

      Jia Zhou Huaying Zhong . Experimental Design of Computational Materials Science Combined with Machine Learning. University Chemistry, 2025, 40(3): 171-177. doi: 10.12461/PKU.DXHX202406004

    3. [3]

      Jia Zhou . Constructing Potential Energy Surface of Water Molecule by Quantum Chemistry and Machine Learning: Introduction to a Comprehensive Computational Chemistry Experiment. University Chemistry, 2024, 39(3): 351-358. doi: 10.3866/PKU.DXHX202309060

    4. [4]

      Xiaochen Zhang Fei Yu Jie Ma . 多角度数理模拟在电容去离子中的前沿应用. Acta Physico-Chimica Sinica, 2024, 40(11): 2311026-. doi: 10.3866/PKU.WHXB202311026

    5. [5]

      Xintian Xie Sicong Ma Yefei Li Cheng Shang Zhipan Liu . Application of Machine Learning Potential-based Theoretical Simulations in Undergraduate Teaching Laboratory Course Design. University Chemistry, 2025, 40(3): 140-147. doi: 10.12461/PKU.DXHX202405164

    6. [6]

      Chi Li Jichao Wan Qiyu Long Hui Lv Ying XiongN-Heterocyclic Carbene (NHC)-Catalyzed Amidation of Aldehydes with Nitroso Compounds. University Chemistry, 2024, 39(5): 388-395. doi: 10.3866/PKU.DXHX202312016

    7. [7]

      Yuena Yu Fang Fang . Microwave-Assisted Synthesis of Safinamide Methanesulfonate. University Chemistry, 2024, 39(11): 210-216. doi: 10.3866/PKU.DXHX202401076

    8. [8]

      Gaofeng Zeng Shuyu Liu Manle Jiang Yu Wang Ping Xu Lei Wang . Micro/Nanorobots for Pollution Detection and Toxic Removal. University Chemistry, 2024, 39(9): 229-234. doi: 10.12461/PKU.DXHX202311055

    9. [9]

      Guangming Yang Yunhui Long . Design and Implementation of Analytical Chemistry Curriculum Based on the Learning Community of Teachers and Students. University Chemistry, 2024, 39(3): 132-137. doi: 10.3866/PKU.DXHX202309089

    10. [10]

      Yuting Zhang Zhiqian Wang . Methods and Case Studies for In-Depth Learning of the Aldol Reaction Based on Its Reversible Nature. University Chemistry, 2024, 39(7): 377-380. doi: 10.3866/PKU.DXHX202311037

    11. [11]

      Jinkang Jin Yidian Sheng Ping Lu Zhan Lu . Introducing a Website for Learning Nuclear Magnetic Resonance (NMR) Spectrum Analysis. University Chemistry, 2024, 39(11): 388-396. doi: 10.12461/PKU.DXHX202403054

    12. [12]

      Lei Shu Zhengqing Hao Kai Yan Hong Wang Lihua Zhu Fang Chen Nan Wang . Development of a Double-Carbon Related Experiment: Preparation, Characterization and Carbon-Capture Ability of Eggshell-Derived CaO. University Chemistry, 2024, 39(4): 149-156. doi: 10.3866/PKU.DXHX202310134

    13. [13]

      Ping Ye Lingshuang Qin Mengyao He Fangfang Wu Zengye Chen Mingxing Liang Libo Deng . 荷叶衍生多孔碳的零电荷电位调节实现废水中电化学捕集镉离子. Acta Physico-Chimica Sinica, 2025, 41(3): 2311032-. doi: 10.3866/PKU.WHXB202311032

    14. [14]

      Weina Wang Lixia Feng Fengyi Liu Wenliang Wang . Computational Chemistry Experiments in Facilitating the Study of Organic Reaction Mechanism: A Case Study of Electrophilic Addition of HCl to Asymmetric Alkenes. University Chemistry, 2025, 40(3): 206-214. doi: 10.12461/PKU.DXHX202407022

    15. [15]

      Hao Wu Zhen Liu Dachang Bai1H NMR Spectrum of Amide Compounds. University Chemistry, 2024, 39(3): 231-238. doi: 10.3866/PKU.DXHX202309020

    16. [16]

      Meijin Li Xirong Fu Xue Zheng Yuhan Liu Bao Li . The Marvel of NAD+: Nicotinamide Adenine Dinucleotide. University Chemistry, 2024, 39(9): 35-39. doi: 10.12461/PKU.DXHX202401027

    17. [17]

      Hong Zheng Xin Peng Chunwang Yi . The Tale of Caprolactam Cyclic Oligomers: The Ever-changing Life of “Princess Cyclo”. University Chemistry, 2024, 39(9): 40-47. doi: 10.12461/PKU.DXHX202403058

    18. [18]

      Caixia Lin Ting Liu Zhaojiang Shi Hong Yan Keyin Ye Yaofeng Yuan . Innovative Experiment of Electrochemical Dearomative Spirocyclization of N-Acyl Sulfonamides. University Chemistry, 2025, 40(4): 359-366. doi: 10.12461/PKU.DXHX202406107

    19. [19]

      Shiyi WANGChaolong CHENXiangjian KONGLansun ZHENGLasheng LONG . Polynuclear lanthanide compound [Ce4Ce6(μ3-O)4(μ4-O)4(acac)14(CH3O)6]·2CH3OH for the hydroboration of amides to amine. Chinese Journal of Inorganic Chemistry, 2025, 41(1): 88-96. doi: 10.11862/CJIC.20240342

    20. [20]

      Minna Ma Yujin Ouyang Yuan Wu Mingwei Yuan Lijuan Yang . Green Synthesis of Medical Chemiluminescence Reagents by Photocatalytic Oxidation. University Chemistry, 2024, 39(5): 134-143. doi: 10.3866/PKU.DXHX202310093

Metrics
  • PDF Downloads(3)
  • Abstract views(123)
  • HTML views(17)

通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索
Address:Zhongguancun North First Street 2,100190 Beijing, PR China Tel: +86-010-82449177-888
Powered By info@rhhz.net

/

DownLoad:  Full-Size Img  PowerPoint
Return