Citation: Xinghai Li, Zhisen Wu, Lijing Zhang, Shengyang Tao. Machine Learning Enables the Prediction of Amide Bond Synthesis Based on Small Datasets[J]. Acta Physico-Chimica Sinica, ;2025, 41(2): 230904. doi: 10.3866/PKU.WHXB202309041 shu

Machine Learning Enables the Prediction of Amide Bond Synthesis Based on Small Datasets

  • Corresponding author: Lijing Zhang, zhanglj@dlut.edu.cn Shengyang Tao, taosy@dlut.edu.cn
  • These authors contributed equally to this paper.
  • Received Date: 27 September 2023
    Revised Date: 29 November 2023
    Accepted Date: 30 November 2023

    Fund Project: the National Natural Science Foundation of China 22072011the National Natural Science Foundation of China 22372025the National Natural Science Foundation of China 22211530456the Fundamental Research Funds for the Central Universities DUT22LAB607the Fundamental Research Funds for the Central Universities DUT22QN226

  • Machine learning (ML) is progressively revealing notable advantages in chemical synthesis. However, the limited output of experimental data from traditional methods poses a bottleneck, impeding the widespread adoption of machine learning. Data from literature often leads to overly optimistic predictions, and obtaining thousands of experimental data points through experiments remains a substantial challenge. Using a small dataset of experimental data, we illustrated that machine learning algorithms can reliably predict the conversion rate of amide bond synthesis. We gathered hundreds of experimental data points for 9 aromatic amines and 12 organic acids using various coupling reagents and solvents in a 96-well plate high-throughput experimental setup. Subsequently, we derived 76 feature molecular descriptors from quantum chemical calculations and utilized them as inputs for training the machine learning model. Despite the inherent limitation of low data volume, the random forest algorithm demonstrated outstanding predictive performance (R2 > 0.95). Through comprehensive analysis of the reaction process employing importance analysis, shapley additive explanations (SHAP), and accumulated local effects (ALE) methods, we delved into the important factors influencing the reaction conversion rate. In predicting the conversion rate of unknown aromatic amine molecules, we discovered that incorporating a small amount of unknown molecule-related reaction data into the training set effectively enhances the model's predictive performance, even with a small dataset. By comparing models trained on different molecular descriptors such as density functional theory (DFT) and one-hot encoding, we validated the efficacy of adjusting the training set to improve prediction results. This study utilized a multitude of chemically meaningful feature descriptors and achieved more effective prediction results through multidimensional data analysis, offering valuable insights for machine learning-assisted chemical synthesis research in small datasets. In the near future, machine learning is poised to drive the intelligent development of organic chemistry.
  • 加载中
    1. [1]

      Jordan, M. I.; Mitchell, T. M. Science 2015, 349, 255. doi: 10.1126/science.aaa8415  doi: 10.1126/science.aaa8415

    2. [2]

      Young, T.; Hazarika, D.; Poria, S.; Cambria, E. IEEE Comput. Intell. Mag. 2018, 13, 55. doi: 10.1109/mci.2018.2840738  doi: 10.1109/mci.2018.2840738

    3. [3]

      Myszczynska, M. A.; Ojamies, P. N.; Lacoste, A. M. B.; Neil, D.; Saffari, A.; Mead, R.; Hautbergue, G. M.; Holbrook, J. D.; Ferraiuolo, L. Nat. Rev. Neurol. 2020, 16, 440. doi: 10.1038/s41582-020-0377-8  doi: 10.1038/s41582-020-0377-8

    4. [4]

      Ranjan, R.; Sankaranarayanan, S.; Bansal, A.; Bodla, N.; Chen, J. C.; Patel, V. M.; Castillo, C. D.; Chellappa, R. IEEE Signal Process. Mag. 2018, 35, 66. doi: 10.1109/msp.2017.2764116  doi: 10.1109/msp.2017.2764116

    5. [5]

      Segler, M. H. S.; Waller, M. P. Chem. -Eur. J. 2017, 23, 5966. doi: 10.1002/chem.201605499  doi: 10.1002/chem.201605499

    6. [6]

      Shen, Y.; Borowski, J. E.; Hardy, M. A.; Sarpong, R.; Doyle, A. G.; Cernak, T. Nat. Rev. Method. Prim. 2021, 1, 1. doi: 10.1038/s43586-021-00022-5  doi: 10.1038/s43586-021-00022-5

    7. [7]

      Brockherde, F.; Vogt, L.; Li, L.; Tuckerman, M. E.; Burke, K.; Müller, K. R. Nat. Commun. 2017, 8, 1. doi: 10.1038/s41467-017-00839-3  doi: 10.1038/s41467-017-00839-3

    8. [8]

      Dara, S.; Dhamercherla, S.; Jadav, S. S.; Babu, C. M.; Ahsan, M. J. Artif. Intell. Rev. 2022, 55, 1947. doi: 10.1007/s10462-021-10058-4  doi: 10.1007/s10462-021-10058-4

    9. [9]

      Ahneman, D. T.; Estrada, J. G.; Lin, S. S.; Dreher, S. D.; Doyle, A. G. Science 2018, 360, 186. doi: 10.1126/science.aar5169  doi: 10.1126/science.aar5169

    10. [10]

      Raccuglia, P.; Elbert, K. C.; Adler, P. D.; Falk, C.; Wenny, M. B.; Mollo, A.; Zeller, M.; Friedler, S. A.; Schrier, J.; Norquist, A. J. Nature 2016, 533, 73. doi: 10.1038/nature17439  doi: 10.1038/nature17439

    11. [11]

      Roszak, R.; Beker, W.; Molga, K.; Grzybowski, B. A. J. Am. Chem. Soc. 2019, 141, 17142. doi: 10.1021/jacs.9b05895  doi: 10.1021/jacs.9b05895

    12. [12]

      Gao, H.; Struble, T. J.; Coley, C. W.; Wang, Y.; Green, W. H.; Jensen, K. F. ACS Central Sci. 2018, 4, 1465. doi: 10.1021/acscentsci.8b00357  doi: 10.1021/acscentsci.8b00357

    13. [13]

      Zahrt, A. F.; Henle, J. J.; Rose, B. T.; Wang, Y.; Darrow, W. T.; Denmark, S. E. Science 2019, 363, 1. doi: 10.1126/science.aau5631  doi: 10.1126/science.aau5631

    14. [14]

      Reid, J. P.; Sigman, M. S. Nature 2019, 571, 343. doi: 10.1038/s41586-019-1384-z  doi: 10.1038/s41586-019-1384-z

    15. [15]

      Segler, M. H. S.; Preuss, M.; Waller, M. P. Nature 2018, 555, 604. doi: 10.1038/nature25978  doi: 10.1038/nature25978

    16. [16]

      Coley, C. W.; Thomas, D. A.; Lummiss, J. A. M.; Jaworski, J. N.; Breen, C. P.; Schultz, V.; Hart, T.; Fishman, J. S.; Rogers, L, ; Gao, H.; et al. Science 2019, 365, 1. doi: 10.1126/science.aax1566  doi: 10.1126/science.aax1566

    17. [17]

      Santanilla, A. B.; Regalado, E. L.; Pereira, T.; Shevlin, M.; Bateman, K.; Campeau, L. C.; Schneeweis, J.; Berritt, S.; Shi, Z. C.; Nantermet, P.; et al. Science 2015, 347, 49. doi: 10.1126/science.1259203  doi: 10.1126/science.1259203

    18. [18]

      Krska, S. W.; DiRocco, D. A.; Dreher, S. D.; Shevlin, M. Accounts Chem. Res. 2017, 50, 2976. doi: 10.1021/acs.accounts.7b00428  doi: 10.1021/acs.accounts.7b00428

    19. [19]

      Mennen, S. M.; Alhambra, C.; Allen, C. L.; Barberis, M.; Berritt, S.; Brandt, T. A.; Campbell, A. D.; Castañón, J.; Cherney, A. H.; Christensen, M.; et al. Org. Process Res. Dev. 2019, 23, 1213. doi: 10.1021/acs.oprd.9b00140  doi: 10.1021/acs.oprd.9b00140

    20. [20]

      Seefried, F.; Schmidt, T.; Reinecke, M.; Heinzlmeir, S.; Kuster, B.; Wilhelm, M. J. Proteome Res. 2019, 18, 1486. doi: 10.1021/acs.jproteome.8b00724  doi: 10.1021/acs.jproteome.8b00724

    21. [21]

      Figueiredo, R. M.; Suppo, J. S.; Campagne, J. M. Chem. Rev. 2016, 116, 12029. doi: 10.1021/acs.chemrev.6b00237  doi: 10.1021/acs.chemrev.6b00237

    22. [22]

      Roughley, S. D.; Jordan, A. M. J. Med. Chem. 2011, 54, 3451. doi: 10.1021/jm200187y  doi: 10.1021/jm200187y

    23. [23]

      Sabatini, M. T.; Boulton, L. T.; Sneddon, H. F.; Sheppard, T. D. Nat. Catal. 2019, 2, 10. doi: 10.1038/s41929-018-0211-5  doi: 10.1038/s41929-018-0211-5

    24. [24]

      Brown, D. G.; Bostrom, J. J. Med. Chem. 2016, 59, 4443. doi: 10.1021/acs.jmedchem.5b01409  doi: 10.1021/acs.jmedchem.5b01409

    25. [25]

      Halford, B. ACS Central Sci. 2022, 8, 405. doi: 10.1021/acscentsci.2c00369  doi: 10.1021/acscentsci.2c00369

    26. [26]

      Syed, Y. Y. Drugs 2022, 82, 455. doi: 10.1007/s40265-022-01684-5  doi: 10.1007/s40265-022-01684-5

    27. [27]

      Ghosh, S. C.; Ngiam, J. S.; Seayad, A. M.; Tuan, D. T.; Chai, C. L. L.; Chen, A. J. Org. Chem. 2012, 77, 8007. doi: 10.1021/jo301252c  doi: 10.1021/jo301252c

    28. [28]

      Pattabiraman, V. R.; Bode, J. W. Nature 2011, 480, 471. doi: 10.1038/nature10702  doi: 10.1038/nature10702

    29. [29]

      Beker, W.; Gajewska, E. P.; Badowski, T.; Grzybowski, B. A. Angew. Chem. -Int. Edit. 2019, 58, 4515. doi: 10.1002/anie.201806920  doi: 10.1002/anie.201806920

    30. [30]

      Aydogdu, S.; Hatipoglu, A. J. Indian Chem. Soc. 2022, 99, 100752. doi: 10.1016/j.jics.2022.100752  doi: 10.1016/j.jics.2022.100752

    31. [31]

      Ma, Y.; Zhang, X.; Zhu, L.; Feng, X.; Kowah, J. A. H.; Jiang, J.; Wang, L.; Jiang, L.; Liu, X. Molecules 2023, 28, 5995. doi: 10.3390/molecules28165995  doi: 10.3390/molecules28165995

    32. [32]

      Ramakrishnan, R.; Dral, P. O.; Rupp, M.; Lilienfeld, O. A. V. Sci. Data 2014, 1, 140022. doi: 10.1038/sdata.2014.22  doi: 10.1038/sdata.2014.22

    33. [33]

      Tsubaki, M.; Mizoguchi, T. J. Phys. Chem. Lett. 2018, 9, 5733. doi: 10.1021/acs.jpclett.8b01837  doi: 10.1021/acs.jpclett.8b01837

    34. [34]

      https://github.com/doylelab/rxnpredict (accessed Dec. 28, 2023)

    35. [35]

      Yousef, W. A. Pattern Recognit. Lett. 2021, 146, 115. doi: 10.1016/j.patrec.2021.02.022  doi: 10.1016/j.patrec.2021.02.022

    36. [36]

      Dodge, Y. The Concise Encyclopedia of Statistics; Springer New York: New York, NY, USA, 2008; pp. 88–91.

    37. [37]

      Zollanvari, A.; Dougherty, E. R. Pattern Recognit. 2014, 47, 2178. doi: 10.1016/j.patcog.2013.11.022  doi: 10.1016/j.patcog.2013.11.022

    38. [38]

      Song, W.; Dong, K.; Li, M. Org. Lett. 2020, 22, 371. doi: 10.1021/acs.orglett.9b03905  doi: 10.1021/acs.orglett.9b03905

    39. [39]

      Mali, S. M.; Bhaisare, R. D.; Gopi, H. N. J. Org. Chem. 2013, 78, 5550. doi: 10.1021/jo400701v  doi: 10.1021/jo400701v

    40. [40]

      Chen, Z.; Fu, R.; Chai, W.; Zheng, H.; Sun, L.; Lu, Q.; Yuan, R. Tetrahedron 2014, 70, 2237. doi: 10.1016/j.tet.2014.02.042  doi: 10.1016/j.tet.2014.02.042

    41. [41]

      Li, X.; Li, Z.; Deng, H.; Deng, H.; Zhou, X. Tetrahedron Lett. 2013, 54, 2212. doi: 10.1016/j.tetlet.2013.02.058  doi: 10.1016/j.tetlet.2013.02.058

  • 加载中
    1. [1]

      Jiali CHENGuoxiang ZHAOYayu YANWanting XIAQiaohong LIJian ZHANG . Machine learning exploring the adsorption of electronic gases on zeolite molecular sieves. Chinese Journal of Inorganic Chemistry, 2025, 41(1): 155-164. doi: 10.11862/CJIC.20240408

    2. [2]

      Jia Zhou Huaying Zhong . Experimental Design of Computational Materials Science Combined with Machine Learning. University Chemistry, 2025, 40(3): 171-177. doi: 10.12461/PKU.DXHX202406004

    3. [3]

      Jia Zhou . Constructing Potential Energy Surface of Water Molecule by Quantum Chemistry and Machine Learning: Introduction to a Comprehensive Computational Chemistry Experiment. University Chemistry, 2024, 39(3): 351-358. doi: 10.3866/PKU.DXHX202309060

    4. [4]

      Ying LiangYuheng DengShilv YuJiahao ChengJiawei SongJun YaoYichen YangWanlei ZhangWenjing ZhouXin ZhangWenjian ShenGuijie LiangBin LiYong PengRun HuWangnan Li . Machine learning-guided antireflection coatings architectures and interface modification for synergistically optimizing efficient and stable perovskite solar cells. Acta Physico-Chimica Sinica, 2025, 41(9): 100098-0. doi: 10.1016/j.actphy.2025.100098

    5. [5]

      Xiaochen ZhangFei YuJie Ma . Cutting-Edge Applications of Multi-Angle Numerical Simulations for Capacitive Deionization. Acta Physico-Chimica Sinica, 2024, 40(11): 2311026-0. doi: 10.3866/PKU.WHXB202311026

    6. [6]

      Xintian Xie Sicong Ma Yefei Li Cheng Shang Zhipan Liu . Application of Machine Learning Potential-based Theoretical Simulations in Undergraduate Teaching Laboratory Course Design. University Chemistry, 2025, 40(3): 140-147. doi: 10.12461/PKU.DXHX202405164

    7. [7]

      Chi Li Jichao Wan Qiyu Long Hui Lv Ying XiongN-Heterocyclic Carbene (NHC)-Catalyzed Amidation of Aldehydes with Nitroso Compounds. University Chemistry, 2024, 39(5): 388-395. doi: 10.3866/PKU.DXHX202312016

    8. [8]

      Yuena Yu Fang Fang . Microwave-Assisted Synthesis of Safinamide Methanesulfonate. University Chemistry, 2024, 39(11): 210-216. doi: 10.3866/PKU.DXHX202401076

    9. [9]

      Gaofeng Zeng Shuyu Liu Manle Jiang Yu Wang Ping Xu Lei Wang . Micro/Nanorobots for Pollution Detection and Toxic Removal. University Chemistry, 2024, 39(9): 229-234. doi: 10.12461/PKU.DXHX202311055

    10. [10]

      Guangming Yang Yunhui Long . Design and Implementation of Analytical Chemistry Curriculum Based on the Learning Community of Teachers and Students. University Chemistry, 2024, 39(3): 132-137. doi: 10.3866/PKU.DXHX202309089

    11. [11]

      Yuting Zhang Zhiqian Wang . Methods and Case Studies for In-Depth Learning of the Aldol Reaction Based on Its Reversible Nature. University Chemistry, 2024, 39(7): 377-380. doi: 10.3866/PKU.DXHX202311037

    12. [12]

      Jinkang Jin Yidian Sheng Ping Lu Zhan Lu . Introducing a Website for Learning Nuclear Magnetic Resonance (NMR) Spectrum Analysis. University Chemistry, 2024, 39(11): 388-396. doi: 10.12461/PKU.DXHX202403054

    13. [13]

      Xue-Peng Zhang Yuchi Long Yushu Pan Jiding Wang Baoyu Bai Rui Ding . 定量构效关系方法学习探索:以钴卟啉活化氧气为例. University Chemistry, 2025, 40(8): 345-359. doi: 10.12461/PKU.DXHX202410107

    14. [14]

      Lei Shu Zhengqing Hao Kai Yan Hong Wang Lihua Zhu Fang Chen Nan Wang . Development of a Double-Carbon Related Experiment: Preparation, Characterization and Carbon-Capture Ability of Eggshell-Derived CaO. University Chemistry, 2024, 39(4): 149-156. doi: 10.3866/PKU.DXHX202310134

    15. [15]

      Ping YeLingshuang QinMengyao HeFangfang WuZengye ChenMingxing LiangLibo Deng . Potential of Zero Charge-Mediated Electrochemical Capture of Cadmium Ions from Wastewater by Lotus Leaf-Derived Porous Carbons. Acta Physico-Chimica Sinica, 2025, 41(3): 2311032-0. doi: 10.3866/PKU.WHXB202311032

    16. [16]

      Weina Wang Lixia Feng Fengyi Liu Wenliang Wang . Computational Chemistry Experiments in Facilitating the Study of Organic Reaction Mechanism: A Case Study of Electrophilic Addition of HCl to Asymmetric Alkenes. University Chemistry, 2025, 40(3): 206-214. doi: 10.12461/PKU.DXHX202407022

    17. [17]

      Chang Guo Haipeng Yang Hui Fang Yingguo Zhao Yating Li . 基于深度学习的物理化学课程DOK教学实践初探——以弯曲液面附加压力和蒸气压教学为例. University Chemistry, 2025, 40(6): 28-36. doi: 10.12461/PKU.DXHX202408049

    18. [18]

      Hao Wu Zhen Liu Dachang Bai1H NMR Spectrum of Amide Compounds. University Chemistry, 2024, 39(3): 231-238. doi: 10.3866/PKU.DXHX202309020

    19. [19]

      Peihong Fan Hongxiang Lou . 研究生高等天然药物化学课程的教学改革探索——导学互促式混合课堂教学与自主学习能力培养. University Chemistry, 2025, 40(6): 16-21. doi: 10.12461/PKU.DXHX202407078

    20. [20]

      Meijin Li Xirong Fu Xue Zheng Yuhan Liu Bao Li . The Marvel of NAD+: Nicotinamide Adenine Dinucleotide. University Chemistry, 2024, 39(9): 35-39. doi: 10.12461/PKU.DXHX202401027

Metrics
  • PDF Downloads(3)
  • Abstract views(184)
  • HTML views(22)

通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索
Address:Zhongguancun North First Street 2,100190 Beijing, PR China Tel: +86-010-82449177-888
Powered By info@rhhz.net

/

DownLoad:  Full-Size Img  PowerPoint
Return