Citation:
Xinghai Li, Zhisen Wu, Lijing Zhang, Shengyang Tao. Machine Learning Enables the Prediction of Amide Bond Synthesis Based on Small Datasets[J]. Acta Physico-Chimica Sinica,
;2025, 41(2): 100010.
doi:
10.3866/PKU.WHXB202309041
-
Machine learning (ML) is progressively revealing notable advantages in chemical synthesis. However, the limited output of experimental data from traditional methods poses a bottleneck, impeding the widespread adoption of machine learning. Data from literature often leads to overly optimistic predictions, and obtaining thousands of experimental data points through experiments remains a substantial challenge. Using a small dataset of experimental data, we illustrated that machine learning algorithms can reliably predict the conversion rate of amide bond synthesis. We gathered hundreds of experimental data points for 9 aromatic amines and 12 organic acids using various coupling reagents and solvents in a 96-well plate high-throughput experimental setup. Subsequently, we derived 76 feature molecular descriptors from quantum chemical calculations and utilized them as inputs for training the machine learning model. Despite the inherent limitation of low data volume, the random forest algorithm demonstrated outstanding predictive performance (R2 > 0.95). Through comprehensive analysis of the reaction process employing importance analysis, shapley additive explanations (SHAP), and accumulated local effects (ALE) methods, we delved into the important factors influencing the reaction conversion rate. In predicting the conversion rate of unknown aromatic amine molecules, we discovered that incorporating a small amount of unknown molecule-related reaction data into the training set effectively enhances the model’s predictive performance, even with a small dataset. By comparing models trained on different molecular descriptors such as density functional theory (DFT) and one-hot encoding, we validated the efficacy of adjusting the training set to improve prediction results. This study utilized a multitude of chemically meaningful feature descriptors and achieved more effective prediction results through multidimensional data analysis, offering valuable insights for machine learning-assisted chemical synthesis research in small datasets. In the near future, machine learning is poised to drive the intelligent development of organic chemistry.
-
-
-
[1]
(1) Jordan, M. I.; Mitchell, T. M. Science 2015, 349, 255. doi: 10.1126/science.aaa8415
-
[2]
(2) Young, T.; Hazarika, D.; Poria, S.; Cambria, E. IEEE Comput. Intell. Mag. 2018, 13, 55. doi: 10.1109/mci.2018.2840738
-
[3]
(3) Myszczynska, M. A.; Ojamies, P. N.; Lacoste, A. M. B.; Neil, D.; Saffari, A.; Mead, R.; Hautbergue, G. M.; Holbrook, J. D.; Ferraiuolo, L. Nat. Rev. Neurol. 2020, 16, 440. doi: 10.1038/s41582-020-0377-8
-
[4]
(4) Ranjan, R.; Sankaranarayanan, S.; Bansal, A.; Bodla, N.; Chen, J. C.; Patel, V. M.; Castillo, C. D.; Chellappa, R. IEEE Signal Process. Mag. 2018, 35, 66. doi: 10.1109/msp.2017.2764116
-
[5]
(5) Segler, M. H. S.; Waller, M. P. Chem.-Eur. J. 2017, 23, 5966. doi: 10.1002/chem.201605499
-
[6]
(6) Shen, Y.; Borowski, J. E.; Hardy, M. A.; Sarpong, R.; Doyle, A. G.; Cernak, T. Nat. Rev. Method. Prim. 2021, 1, 1. doi: 10.1038/s43586-021-00022-5
-
[7]
(7) Brockherde, F.; Vogt, L.; Li, L.; Tuckerman, M. E.; Burke, K.; Müller, K. R. Nat. Commun. 2017, 8, 1. doi: 10.1038/s41467-017-00839-3
-
[8]
(8) Dara, S.; Dhamercherla, S.; Jadav, S. S.; Babu, C. M.; Ahsan, M. J. Artif. Intell. Rev. 2022, 55, 1947. doi: 10.1007/s10462-021-10058-4
-
[9]
(9) Ahneman, D. T.; Estrada, J. G.; Lin, S. S.; Dreher, S. D.; Doyle, A. G. Science 2018, 360, 186. doi: 10.1126/science.aar5169
-
[10]
(10) Raccuglia, P.; Elbert, K. C.; Adler, P. D.; Falk, C.; Wenny, M. B.; Mollo, A.; Zeller, M.; Friedler, S. A.; Schrier, J.; Norquist, A. J. Nature 2016, 533, 73. doi: 10.1038/nature17439
-
[11]
(11) Roszak, R.; Beker, W.; Molga, K.; Grzybowski, B. A. J. Am. Chem. Soc. 2019, 141, 17142. doi: 10.1021/jacs.9b05895
-
[12]
(12) Gao, H.; Struble, T. J.; Coley, C. W.; Wang, Y.; Green, W. H.; Jensen, K. F. ACS Central Sci. 2018, 4, 1465. doi: 10.1021/acscentsci.8b00357
-
[13]
(13) Zahrt, A. F.; Henle, J. J.; Rose, B. T.; Wang, Y.; Darrow, W. T.; Denmark, S. E. Science 2019, 363, 1. doi: 10.1126/science.aau5631
-
[14]
(14) Reid, J. P.; Sigman, M. S. Nature 2019, 571, 343. doi: 10.1038/s41586-019-1384-z
-
[15]
(15) Segler, M. H. S.; Preuss, M.; Waller, M. P. Nature 2018, 555, 604. doi: 10.1038/nature25978
-
[16]
(16) Coley, C. W.; Thomas, D. A.; Lummiss, J. A. M.; Jaworski, J. N.; Breen, C. P.; Schultz, V.; Hart, T.; Fishman, J. S.; Rogers, L,; Gao, H.; et al. Science 2019, 365, 1. doi: 10.1126/science.aax1566
-
[17]
(17) Santanilla, A. B.; Regalado, E. L.; Pereira, T.; Shevlin, M.; Bateman, K.; Campeau, L. C.; Schneeweis, J.; Berritt, S.; Shi, Z. C.; Nantermet, P.; et al. Science 2015, 347, 49. doi: 10.1126/science.1259203
-
[18]
(18) Krska, S. W.; DiRocco, D. A.; Dreher, S. D.; Shevlin, M. Accounts Chem. Res. 2017, 50, 2976. doi: 10.1021/acs.accounts.7b00428
-
[19]
(19) Mennen, S. M.; Alhambra, C.; Allen, C. L.; Barberis, M.; Berritt, S.; Brandt, T. A.; Campbell, A. D.; Castañón, J.; Cherney, A. H.; Christensen, M.; et al. Org. Process Res. Dev. 2019, 23, 1213. doi: 10.1021/acs.oprd.9b00140
-
[20]
(20) Seefried, F.; Schmidt, T.; Reinecke, M.; Heinzlmeir, S.; Kuster, B.; Wilhelm, M. J. Proteome Res. 2019, 18, 1486. doi: 10.1021/acs.jproteome.8b00724
-
[21]
(21) Figueiredo, R. M.; Suppo, J. S.; Campagne, J. M. Chem. Rev. 2016, 116, 12029. doi: 10.1021/acs.chemrev.6b00237
-
[22]
(22) Roughley, S. D.; Jordan, A. M. J. Med. Chem. 2011, 54, 3451. doi: 10.1021/jm200187y
-
[23]
(23) Sabatini, M. T.; Boulton, L. T.; Sneddon, H. F.; Sheppard, T. D. Nat. Catal. 2019, 2, 10. doi: 10.1038/s41929-018-0211-5
-
[24]
(24) Brown, D. G.; Bostrom, J. J. Med. Chem. 2016, 59, 4443. doi: 10.1021/acs.jmedchem.5b01409
-
[25]
(25) Halford, B. ACS Central Sci. 2022, 8, 405. doi: 10.1021/acscentsci.2c00369
-
[26]
(26) Syed, Y. Y. Drugs 2022, 82, 455. doi: 10.1007/s40265-022-01684-5
-
[27]
(27) Ghosh, S. C.; Ngiam, J. S.; Seayad, A. M.; Tuan, D. T.; Chai, C. L. L.; Chen, A. J. Org. Chem. 2012, 77, 8007. doi: 10.1021/jo301252c
-
[28]
(28) Pattabiraman, V. R.; Bode, J. W. Nature 2011, 480, 471. doi: 10.1038/nature10702
-
[29]
(29) Beker, W.; Gajewska, E. P.; Badowski, T.; Grzybowski, B. A. Angew. Chem.-Int. Edit. 2019, 58, 4515. doi: 10.1002/anie.201806920
-
[30]
(30) Aydogdu, S.; Hatipoglu, A. J. Indian Chem. Soc. 2022, 99, 100752. doi: 10.1016/j.jics.2022.100752
-
[31]
(31) Ma, Y.; Zhang, X.; Zhu, L.; Feng, X.; Kowah, J. A. H.; Jiang, J.; Wang, L.; Jiang, L.; Liu, X. Molecules 2023, 28, 5995. doi: 10.3390/molecules28165995
-
[32]
(32) Ramakrishnan, R.; Dral, P. O.; Rupp, M.; Lilienfeld, O. A. V. Sci. Data 2014, 1, 140022. doi: 10.1038/sdata.2014.22
-
[33]
(33) Tsubaki, M.; Mizoguchi, T. J. Phys. Chem. Lett. 2018, 9, 5733. doi: 10.1021/acs.jpclett.8b01837
-
[34]
(34) https://github.com/doylelab/rxnpredict (accessed Dec. 28, 2023)
-
[35]
(35) Yousef, W. A. Pattern Recognit. Lett. 2021, 146, 115. doi: 10.1016/j.patrec.2021.02.022
-
[36]
(36) Dodge, Y. The Concise Encyclopedia of Statistics; Springer New York: New York, NY, USA, 2008; pp. 88–91.
-
[37]
(37) Zollanvari, A.; Dougherty, E. R. Pattern Recognit. 2014, 47, 2178. doi: 10.1016/j.patcog.2013.11.022
-
[38]
(38) Song, W.; Dong, K.; Li, M. Org. Lett. 2020, 22, 371. doi: 10.1021/acs.orglett.9b03905
-
[39]
(39) Mali, S. M.; Bhaisare, R. D.; Gopi, H. N. J. Org. Chem. 2013, 78, 5550. doi: 10.1021/jo400701v
-
[40]
(40) Chen, Z.; Fu, R.; Chai, W.; Zheng, H.; Sun, L.; Lu, Q.; Yuan, R. Tetrahedron 2014, 70, 2237. doi: 10.1016/j.tet.2014.02.042
-
[41]
(41) Li, X.; Li, Z.; Deng, H.; Deng, H.; Zhou, X. Tetrahedron Lett. 2013, 54, 2212. doi: 10.1016/j.tetlet.2013.02.058
-
[1]
-
-
-
[1]
Jiali CHEN , Guoxiang ZHAO , Yayu YAN , Wanting XIA , Qiaohong LI , Jian ZHANG . Machine learning exploring the adsorption of electronic gases on zeolite molecular sieves. Chinese Journal of Inorganic Chemistry, 2025, 41(1): 155-164. doi: 10.11862/CJIC.20240408
-
[2]
Jia Zhou , Huaying Zhong . Experimental Design of Computational Materials Science Combined with Machine Learning. University Chemistry, 2025, 40(3): 171-177. doi: 10.12461/PKU.DXHX202406004
-
[3]
Jia Zhou . Constructing Potential Energy Surface of Water Molecule by Quantum Chemistry and Machine Learning: Introduction to a Comprehensive Computational Chemistry Experiment. University Chemistry, 2024, 39(3): 351-358. doi: 10.3866/PKU.DXHX202309060
-
[4]
Xiaochen Zhang , Fei Yu , Jie Ma . 多角度数理模拟在电容去离子中的前沿应用. Acta Physico-Chimica Sinica, 2024, 40(11): 2311026-. doi: 10.3866/PKU.WHXB202311026
-
[5]
Xintian Xie , Sicong Ma , Yefei Li , Cheng Shang , Zhipan Liu . Application of Machine Learning Potential-based Theoretical Simulations in Undergraduate Teaching Laboratory Course Design. University Chemistry, 2025, 40(3): 140-147. doi: 10.12461/PKU.DXHX202405164
-
[6]
Chi Li , Jichao Wan , Qiyu Long , Hui Lv , Ying Xiong . N-Heterocyclic Carbene (NHC)-Catalyzed Amidation of Aldehydes with Nitroso Compounds. University Chemistry, 2024, 39(5): 388-395. doi: 10.3866/PKU.DXHX202312016
-
[7]
Yuena Yu , Fang Fang . Microwave-Assisted Synthesis of Safinamide Methanesulfonate. University Chemistry, 2024, 39(11): 210-216. doi: 10.3866/PKU.DXHX202401076
-
[8]
Gaofeng Zeng , Shuyu Liu , Manle Jiang , Yu Wang , Ping Xu , Lei Wang . Micro/Nanorobots for Pollution Detection and Toxic Removal. University Chemistry, 2024, 39(9): 229-234. doi: 10.12461/PKU.DXHX202311055
-
[9]
Guangming Yang , Yunhui Long . Design and Implementation of Analytical Chemistry Curriculum Based on the Learning Community of Teachers and Students. University Chemistry, 2024, 39(3): 132-137. doi: 10.3866/PKU.DXHX202309089
-
[10]
Yuting Zhang , Zhiqian Wang . Methods and Case Studies for In-Depth Learning of the Aldol Reaction Based on Its Reversible Nature. University Chemistry, 2024, 39(7): 377-380. doi: 10.3866/PKU.DXHX202311037
-
[11]
Jinkang Jin , Yidian Sheng , Ping Lu , Zhan Lu . Introducing a Website for Learning Nuclear Magnetic Resonance (NMR) Spectrum Analysis. University Chemistry, 2024, 39(11): 388-396. doi: 10.12461/PKU.DXHX202403054
-
[12]
Lei Shu , Zhengqing Hao , Kai Yan , Hong Wang , Lihua Zhu , Fang Chen , Nan Wang . Development of a Double-Carbon Related Experiment: Preparation, Characterization and Carbon-Capture Ability of Eggshell-Derived CaO. University Chemistry, 2024, 39(4): 149-156. doi: 10.3866/PKU.DXHX202310134
-
[13]
Ping Ye , Lingshuang Qin , Mengyao He , Fangfang Wu , Zengye Chen , Mingxing Liang , Libo Deng . 荷叶衍生多孔碳的零电荷电位调节实现废水中电化学捕集镉离子. Acta Physico-Chimica Sinica, 2025, 41(3): 2311032-. doi: 10.3866/PKU.WHXB202311032
-
[14]
Weina Wang , Lixia Feng , Fengyi Liu , Wenliang Wang . Computational Chemistry Experiments in Facilitating the Study of Organic Reaction Mechanism: A Case Study of Electrophilic Addition of HCl to Asymmetric Alkenes. University Chemistry, 2025, 40(3): 206-214. doi: 10.12461/PKU.DXHX202407022
-
[15]
Hao Wu , Zhen Liu , Dachang Bai . 1H NMR Spectrum of Amide Compounds. University Chemistry, 2024, 39(3): 231-238. doi: 10.3866/PKU.DXHX202309020
-
[16]
Meijin Li , Xirong Fu , Xue Zheng , Yuhan Liu , Bao Li . The Marvel of NAD+: Nicotinamide Adenine Dinucleotide. University Chemistry, 2024, 39(9): 35-39. doi: 10.12461/PKU.DXHX202401027
-
[17]
Hong Zheng , Xin Peng , Chunwang Yi . The Tale of Caprolactam Cyclic Oligomers: The Ever-changing Life of “Princess Cyclo”. University Chemistry, 2024, 39(9): 40-47. doi: 10.12461/PKU.DXHX202403058
-
[18]
Caixia Lin , Ting Liu , Zhaojiang Shi , Hong Yan , Keyin Ye , Yaofeng Yuan . Innovative Experiment of Electrochemical Dearomative Spirocyclization of N-Acyl Sulfonamides. University Chemistry, 2025, 40(4): 359-366. doi: 10.12461/PKU.DXHX202406107
-
[19]
Shiyi WANG , Chaolong CHEN , Xiangjian KONG , Lansun ZHENG , Lasheng LONG . Polynuclear lanthanide compound [Ce4ⅢCe6Ⅳ(μ3-O)4(μ4-O)4(acac)14(CH3O)6]·2CH3OH for the hydroboration of amides to amine. Chinese Journal of Inorganic Chemistry, 2025, 41(1): 88-96. doi: 10.11862/CJIC.20240342
-
[20]
Minna Ma , Yujin Ouyang , Yuan Wu , Mingwei Yuan , Lijuan Yang . Green Synthesis of Medical Chemiluminescence Reagents by Photocatalytic Oxidation. University Chemistry, 2024, 39(5): 134-143. doi: 10.3866/PKU.DXHX202310093
-
[1]
Metrics
- PDF Downloads(3)
- Abstract views(122)
- HTML views(17)