Machine learning-assisted construction of C=O and pyridinic N active sites in sludge-based catalysts

Citation: Xu He, Wenjie Gao, Jinglei Xu, Zhanjun Cheng, Wenchao Peng, Beibei Yan, Guanyi Chen, Ning Li. Machine learning-assisted construction of C=O and pyridinic N active sites in sludge-based catalysts[J]. Chinese Chemical Letters, 2025, 36(12): 111019. doi: 10.1016/j.cclet.2025.111019 shu

Machine learning-assisted construction of C=O and pyridinic N active sites in sludge-based catalysts

English

Machine learning-assisted construction of C=O and pyridinic N active sites in sludge-based catalysts

a.
School of Environmental Science and Engineering/ Tianjin Research Center for Safe Disposal of Organic Solid Waste and Energy Utilization Engineering, Tianjin University, Tianjin 300072, China
b.
School of Chemical Engineering and Technology, Tianjin University, Tianjin 300050, China
c.
School of Mechanical Engineering, Tianjin University of Commerce, Tianjin 300134, China
^* Corresponding author.
E-mail address: liningec@tju.edu.cn (N. Li).
Received Date: 21 October 2024
Accepted Date: 27 February 2025
Revised Date: 13 January 2025
Available Online: 15 December 2025

Abstract: The type and quantity of active sites on a catalyst surface determine catalytic activity. In this study, machine learning was employed to assist in the construction of C=O and pyridine N active sites using sludge waste. Reactive descriptors, including C%, N%, O%, Fe%, pyrolysis temperature, heating rate, and pyrolysis time were proposed. Decision tree, extra tree, extreme gradient boosting (XGB), automatic relevance determination, and Bayesian ridge regression models were constructed and optimized. Among these, the XGB model was demonstrated with superior accuracy for prediction of C=O sites on the catalyst surface. Additionally, an ensemble model combining extra trees and XGB was developed to predict pyridine N, with R² value as high as 0.80 and minimum root mean square error (RMSE) of 0.1386. The ensemble model demonstrated a 17% improvement in accuracy compared to individual models. The model enables high-throughput screening of construction conditions for C=O and pyridine N. The study found that a pyrolysis temperature above of 500–800 ℃, a heating rate of 10–20 ℃/min, and a heating time of 120–200 min favor the generation of C=O active sites. For pyridine N sites, a pyrolysis temperature between 400 ℃ and 600 ℃, a heating rate of 5–10 ℃/min, and a pyrolysis time of around 150 min are optimal. Experimental validation demonstrated that both models exhibit excellent predictive performance, with prediction errors below 10% in all cases. This research provides a method to assist in the construction of C=O and pyridine N active sites, which is beneficial for guiding the design of sludge catalysts.

Key words:

Sewage sludge (SS), a byproduct of wastewater treatment, has aroused widespread attention. The safe disposal of SS becomes one of the key concerns for wastewater treatment plants. Currently, the global annual production of dry solid sewage sludge is approximately 53 million tons. However, > 80% is not properly treated, posing a potential threat to the environment [1,2]. Therefore, effective measures are urgently needed to manage SS. Sludge is rich in organic matter and usually needs to be dehydrated before being burned to generate electricity or heat. In addition, the mineral composition of sludge is similar to that of clay, so sludge can be used as a raw material to produce building materials such as ceramist, replacing some clay resources. In addition, sludge is rich in nitrogen, phosphorus and potassium, which can be used as an organic fertilizer to improve soil quality [3]. For sustainability, resource utilization of SS has emerged as a preferred option [4]. Fortunately, thermal treatment provides an efficient and environmentally friendly method for high-value recovery by converting sludge waste into functional biochar [5,6]. Utilizing SS as a raw material for carbon-based catalysts preparation is regarded as a novel approach for both sludge reduction and resource recovery [7–9].

Advanced oxidation processes (AOPs) are widely applied in pollutant removal and energy conversion. Carbon atoms with sp² hybridization and electron-rich functional groups, such as C–O and C=O, can act as interfacial sites for the activation of peroxymonosulfate (PMS). The sp² hybridized carbon serves as an efficient electron transport medium, enabling the activation of PMS via electron transfer to generate sulfate radicals (SO₄^•−) and hydroxyl radicals (^•OH) [10]. Pyridinic N acted as Lewis basic sites to transfer electrons to electrophilic oxygen in C-S₂O₈²⁻ complex, leading to the cleavage of O–O bond and generation of SO₄^•− and O₂^•− [11]. Additionally, the π-electron delocalization and abundant structural defects in the highly graphitized carbon lattice enhance unpaired electron migration and promote the cleavage of the O–O bond in PMS, further facilitating the formation of SO₄^•− and ^•OH. The C=C bonds, owing to their freely delocalized π-electrons, can also activate PMS to produce SO₄^•− and ^•OH. Moreover, a high concentration of sp² hybrid carbon structures and C=O groups can efficiently transfer electrons to PMS, initiating the generation of SO₄^•−, which subsequently reacts with H₂O or OH⁻ to form ^•OH. Singlet oxygen (¹O₂) can be generated through the reactions between SO₅^•− and H₂O [12]. Research indicates that SS is a promising feedstock for biochar production due to rich carbon content and other elements [13–15]. The SS-derived carbon-based materials feature complex pore structures, large specific surface area [16], abundant functional groups (such as hydroxyl, carboxyl, and amino groups), and excellent surface charge distribution [17]. The surface hydroxyl groups can bind to PS through hydrogen bonds, release electrons and activate PS through redox reactions, eventually generating SO₄^•−, carbonyl C=O groups can also generate ¹O₂ through nucleophilic addition and interaction with PMS via peroxide intermediates. Amino groups can lead to negative binding energy or regions of high/low electron density, forming surface-bound reactive complexes [18]. SS-based carbon materials have been successfully used in redox processes for the removal of various pollutants, including dyes, heavy metals, herbicides, phenols, pharmaceuticals and antibiotics [19–22]. It is worth noting that Fe in SS raw materials influence the formation of pyridinic N active sites [23]. During pyrolysis, iron oxides in the raw material initially induce the conversion of protein-derived nitrogen in the char into nitrogen-containing functional groups (such as pyrrolic N, quaternary N, and nitrogen oxides), thereby fixing nitrogen within the char. As the pyrolysis temperature increases, α-Fe forms and reacts with pyridinic N in the semi-coke to further generate Fe_XN. With a further increase in pyrolysis temperature, Fe_XN decomposes and converts into N₂ [23]. Additionally, the presence of Fe in the raw materials favors the formation of C=O active sites [24]. Fe promotes the generation of structural defects on the surface of sludge-derived carbon, which serve as excellent sites for the formation of C=O. Therefore, Fe plays a crucial role in the construction of catalytic active sites. Moreover, the SS carbonization process under different pyrolysis parameters (such as heating rate, pyrolysis temperature, and atmosphere) involves complex chain reactions among various components. Under current scientific conditions, it is challenging to observe the changes of intermediates and functional groups during pyrolysis, which poses an obstacle to elucidating the mechanism of active sites formation and subsequently affects the precise regulation of active sites.

Machine learning (ML) models, utilizing big data analysis and machine learning algorithms, can uncover hidden information and complex nonlinear relationships [25–27]. ML enables the establishment of complex correlations between the composition of sludge-derived carbon catalysts, preparation parameters, and the formation of active sites. Currently, ML has been used to predict catalyst active sites [28]. For instance, Gao et al. employed traditional ML models to predict biochar-based C–C/C=C, C=O, and defect active sites [29]. In comparison, Wang et al. utilized ML techniques to enhance non-radical activation of persulfates using biochar [30]. However, it is important to note that Fe has a significant impact on the formation of C=O and pyridinic N active sites [23,24], a factor that has not yet been considered in existing ML models for active site prediction. Moreover, traditional models face certain challenges. For example, Random forests have been shown to overfit on some noisy classification or regression problems. For data with attributes that possess varying numbers of distinct values, attributes with a greater number of value partitions tend to exert a larger influence on random forests. Consequently, the attribute importance scores generated by random forests in high-dimensional sparse data may be unreliable [31,32]. It is noteworthy that integrating multiple models can enhance prediction accuracy and stability [31]. Model ensembles are particularly effective in handling imbalanced data; when noise or other uncertainties are present in the data, the ensemble model can mitigate effects by averaging the predictions of multiple models. Model ensemble is especially beneficial when dealing with small datasets or models with high complexity, the ensemble model can reduce the risk of overfitting [33]. Furthermore, ensemble models can process various data types, thereby improving the overall generalization ability of the model. Therefore, there is an urgent need to develop new models that can accurately predict the key active sites on the surface of Fe-containing sludge-derived carbon catalysts.

In this study, a combination of big data and artificial intelligence was utilized to explore descriptors capable of predicting active sites from the complex composition of SS and pyrolysis parameters. The descriptors that incorporate both intrinsic factors (e.g., raw material composition including Fe), and extrinsic factors (e.g., preparation parameters), have been proposed. The descriptor can accelerate the screening process for SS-based carbon catalysts with excellent activity. Furthermore, the relationships between SS composition, pyrolysis parameters, and active sites were elucidated by employing shapley additive explanation (SHAP) analysis and importance ranking. Through partial dependence plot analysis, the optimal conditions of pyrolysis time, temperature, and heating rate were determined for the generation of active sites in SS-based carbon catalysts. Two ML models were developed: An XGB model and an ensemble model combining XGB and Extra Tree. Based on the established relationships between the multidimensional factors and active sites, an intrinsic reactive descriptor was summarized. The reactive descriptor can reduce the trial-and-error time required to prepare SS-based carbon catalysts with targeted active sites. The proposed descriptors can also be extended to assist in constructing other active sites in SS-derived catalysts.

Using the aforementioned filled data (Text S2 in Supporting information), we reconstructed the database and then performed Pearson correlation coefficient (PCC) analysis. As illustrated in Fig. 1, blue indicates a weaker linear relationship between two variables, while the red blocks along the diagonal represent autocorrelation, where the PCC value equals 1. Firstly, no single variable exhibits a high linear relationship with the target variables C=O and pyridine N, indicating the necessity for a complex model. Additionally, most variables, except for the pyrolysis temperature, do not show high correlation. Since the pyrolysis temperature can be obtained simultaneously through literature, there was no need for dimensionality reduction. As stated above, the study selected three different preprocessing methods: Standardization, PCA, and no preprocessing. The XGB method was chosen as a sample to compare the impact of different preprocessing methods. All other hyperparameters were set to the same values. As shown in Table 1, the standardization preprocessing achieved both the highest R² and the lowest root mean square error (RMSE). The standardization preprocessing was likely because, once the variables were set to similar numerical ranges, each variable of characteristics becomes more prominent than in the original data [34,35]. The PCA method can reduce the dimensionality of the input variables, but since PCA can obscure features, there was a risk of reduced accuracy [36]. Therefore, the study selected standardization as the preprocessing method.

Figure 1

Figure 1. Pearson correlation matrix between any two variables in the dataset.

DownLoad: Full-Size Img PowerPoint

Table 1

Table 1. Effects of different pretreatment methods.

DownLoad: CSV

Preprocessing methods RMSE R²

Origin 0.1543 0.60

Standardizing 0.1412 0.62

PCA 0.1774 0.56

Using the dataset as input, nine commonly used ML algorithms (Svm, Kneighbors, Bagging, Decision tree, Extra tree, XGB, Ard Regression, Bayesian ridge, and Random forest) were evaluated and compared for their performance in predicting active sites of sludge-based catalysts (Table S1 in Supporting information) [37–39]. R² and RMSE values for the training and test datasets were calculated to assess prediction accuracy (Text S3 in Supporting information). As shown, three ML algorithms (Decision tree, Extra tree, XGB) had training R² > 0.90 (Figs. 2a and c), indicating ML algorithms were well-trained with the training dataset. Among the three ML algorithms with good R² training, XGB had the highest R² validation values for C=O and pyridinic N at 0.58 and 0.62, respectively (Figs. 2b and d). In contrast, XGB was less affected by outliers and had a stronger handling capacity for unbalanced data. The lower RMSE values also indicated the advantages of the XGB algorithm over others (Figs. 2e-h). Based on the screening results, the XGB algorithm was chosen for further optimization to construct the ML framework for predicting C=O and pyridinic N active sites in sludge-based catalysts.

Figure 2

Figure 2. R² values of C=O and pyridine N predicted by different machine learning: (a) C=O training set, (b) pyridine N training set, (c) C=O test set, (d) pyridine N test set; RMSE values of different machine learning algorithms for predicting C=O and pyridine N: (e) C=O training set, (f) pyridine N training set, (g) C=O test set, (h) pyridine N test set.

DownLoad: Full-Size Img PowerPoint

Bayesian optimization was used to optimize the ML-XGB framework to improve prediction accuracy [40,41]. The number of trees, n_estimators parameter, typically increases model performance as model becomes more robust, reducing randomness. However, too many trees can increase computation time with limited performance improvement. The max_depth parameter determines the maximum depth of the trees, a crucial factor for model complexity. Increasing maximum depth allows the model to learn more complex patterns but can lead to overfitting. The learning_rate parameter signifies a lower learning rate, causing the model to learn data patterns more slowly, requiring more n_estimators for training but potentially enhancing final model performance. The subsample parameter trains each tree with a portion of the samples, a variance reduction technique achieved by using only part of the data. The colsample_bytree determines the proportion of features used by each tree, improving robustness by random feature selection for each tree.

After 100 iterations (Text S4 in Supporting information), the optimal hyperparameter values were obtained (Table S2 in Supporting information). The average R² validation value of the ML-XGB framework predicting C=O active sites increased from 0.58 to 0.79, which increased from 0.62 to 0.70 when predicting pyridinic N active sites, indicating further improvement of prediction accuracy through optimization. The change in R² validation value of the optimized ML-XGB framework was smaller than that before optimization, indicating stronger prediction robustness for data variation. Additionally, decision tree, Extra Tree, XGB, ARD Regression, and Bayesian Ridge machine learning algorithms were optimized with the optimized ML-XGB framework. Among all optimized models, the ML-XGB framework had the highest R² validation value, confirming its superior predictive performance over other models (Fig. 3). Ensemble learning using multiple compatible learning algorithms/models for a single task can yield better predictive performance. Integrating XGB and Extra Tree models, with further Bayesian optimization, yielded optimal parameters (Table S2). The average R² validation value of integrated model increased from 0.70 to 0.80 for pyridinic N active sites prediction. However, the R² value decreased from 0.80 to 0.63 after removing Fe content from the model input.

Figure 3

Figure 3. R² of the machine learning algorithm after Bayesian optimization on the test data set: (a) C=O and (b) pyridine N active sites.

DownLoad: Full-Size Img PowerPoint

Since ML is often seen as a “black box”, model interpretation is used to check if the framework has captured the relationship between structural features and reactivity [42,43]. The ML model exhibited high accuracy in predicting active sites on the SS carbon-based catalyst surface. To better guide the design of active sites, the importance of endogenous and exogenous factors and their correlation with active sites were evaluated from two perspectives. First, the constructed ML model was interpreted, and the importance of features was visualized, showing how to influence the active sites on the SS surface.

Feature importance and SHAP were among the most commonly used methods for explaining ML frameworks (Text S4 in Supporting information) [44–46]. The methods can demonstrate the direction and contribution of each feature (e.g., positive or negative contribution). As shown in Fig. 4a, for C=O active sites, the endogenous factor O% and the exogenous factor heating rate were the most influential features. Among other factors, N%, pyrolysis temperature, and C% show moderate importance, while Fe% has relatively low importance. Pyrolysis time has the least impact on the model's predictions. Similarly, as illustrated in Fig. 4c, among all fingerprint factors, the O% factor has the highest correlation coefficient, followed by the holding time. The O% feature exhibits the largest positive change, where higher values of O% (red dots) typically lead to increase the content of C=O active sites. The C% and pyrolysis temperature show a broad distribution around zero, indicating that the factors have a complex influence. Heating rate, and pyrolysis time also have a broad distribution around zero. N% has the lowest eigenvalue, and the characteristic points are concentrated near the zero line, making the lowest contribution to the generation of C=O active sites.

Figure 4

Figure 4. Feature importance: (a) C=O, (b) pyridine N; and SHAP values: (c) C=O, (d) pyridine N.

DownLoad: Full-Size Img PowerPoint

For pyridinic N, as depicted in Fig. 4b, the endogenous factor C% and the exogenous factor pyrolysis temperature were the most critical features, contributing 24.1% and 22.9% to the model's impact, respectively. Among other factors, pyrolysis time, O%, Fe%, and heating rate hold moderate importance. Similarly, as shown in Fig. 4d, the endogenous factor C% and the exogenous factor pyrolysis temperature have the greatest impact on the pyridinic N. For C%, higher values (red dots) generally increase the content of pyridinic N active sites. The pyrolysis temperature characteristic points are mainly distributed on the left side of the zero line. Higher characteristic values are not conducive to the formation of pyridine N. The influence of heating rate and pyrolysis time was relatively centered around zero, suggesting a balanced contribution. The insights not only clarify the key factors influencing the prediction of active sites but also provide valuable guidance for optimizing the design of SS carbon-based catalysts. By leveraging the model, researchers can save time and costs associated with experimental conditions while enhancing catalyst performance in future studies.

The PDP analysis reveals the partial dependence of certain predictors on the target response characteristics (Text S4). For the C=O functional group, the partial dependence on pyrolysis temperature starts to increase gradually at around 400 ℃ and then rises sharply from 500 ℃ onward. The higher temperatures lead to increase the content of C=O active sites (Fig. 5a). The partial dependence on heating rate remains relatively stable with minimal fluctuations, indicating that heating rate has a relatively small impact on the predicted C=O within the range (Fig. 5b). As for pyrolysis time, the partial dependence steadily increases over time, reaching pyrolysis time peak between 120 and 200 min, where lead to increase the content of C=O active sites (Fig. 5c). Therefore, the heating rate has a relatively minor impact on the formation of C=O, while higher pyrolysis temperatures (500–800 ℃) and longer heating rate (10–20 ℃/min) were conducive to C=O formation.

Figure 5

Figure 5. Partial dependence diagram of C=O and pyridine N active sites (a) C=O and pyrolysis temperature, (b) C=O and heating rate, (c) C=O and holding time, (d) pyridine N and pyrolysis temperature, (e) pyridine N and heating rate, (f) pyridine N and holding time.

DownLoad: Full-Size Img PowerPoint

For pyridinic N, the partial dependence on pyrolysis temperature was highest around 400–600 ℃, but partial dependence decreases as the temperature continues to rise (Fig. 5d). The temperatures above 600 ℃ generally led to reduce the content of pyridinic N active sites. The partial dependence on heating rate showed a moderate increase, peaking around 5–10 ℃/min, followed by a slight decrease or stabilization (Fig. 5e). The moderate heating rates were associated with increase the content of pyridinic N active sites, but very high heating rates do not lead to increase the content of pyridinic N active sites. The partial dependence on pyrolysis time increased up to approximately 150 min, and then slightly decreased and stabilized (Fig. 5f). The indicated that an optimal pyrolysis time of around 150 min yielded led to increase the content of pyridinic N active sites, with longer durations not contributing to further significant improvements. Therefore, pyrolysis temperature of approximately 400–600 ℃, heating rate of 5–10 ℃/min, and pyrolysis time of around 150 min were conducive to the formation of pyridinic N active sites.

The developed ML model demonstrated the capability to predict the content of active sites. Four types of SC were prepared (Text S1 in Supporting information) under different pyrolysis conditions, and the elemental composition and active site distribution of SC samples were analyzed. Details of the SC preparation and characterization were provided in Tables S3 and S4 (Supporting information). The chemical composition and active site distribution of the SC samples were analyzed through XPS measurements (Fig. S1 in Supporting information).

The high-resolution C 1s spectra were deconvoluted into four peaks at 284.7, 285.5, 286.5, and 288.8 eV, corresponding to C–C/C=C, C–N/C–O, C=O, and O–C=O, respectively [47]. Similarly, the high-resolution N 1s spectra were deconvoluted into three peaks at 398.5, 399.8, and 400.8 eV, corresponding to pyridinic N, pyrrolic N, and graphitic N, respectively [48]. The measured C=O content in the four samples was 9.51% (SC1), 9.82% (SC2), 8.19% (SC3), and 9.31% (SC4). The pyridinic nitrogen content was 23.90% (SC1), 29.91% (SC2), 21.22% (SC3), and 27.79% (SC4). The ML model was used to predict the active site content in the four samples, and the prediction error for C=O sites was below 10% for all samples, with an average error of < 5.1% (Fig. 6a). For pyridinic N, the prediction errors were 5.1% (SC1), 7.2% (SC2), 8.9% (SC3), and 4.6% (SC4), with an average error of 6.38% (Fig. 6b). Clearly, the model was capable of accurately predicting the active sites in the sludge-derived biochar catalysts.

Figure 6

Figure 6. Measured and ML predicted contents of active sites: (a) C=O and (b) pyridinic N.

DownLoad: Full-Size Img PowerPoint

This study developed an XGB model to predict the reactivity of C=O active sites, and proposed an ensemble machine learning model combining Extra Trees and XGB to predict pyridine N. A new descriptor was proposed, incorporating intrinsic factors related to raw material composition (including iron) and extrinsic factors related to preparation parameters. Models, along with the descriptor, predict the C=O (R² = 0.79, RMSE = 0.0812) and pyridinic N (R² = 0.80, RMSE = 0.1386) active sites on the surface of SS-derived carbon catalysts effectively. Experimental validation revealed that the average prediction error for C=O and pyridinic N active sites were 5.1% and 4.6%, respectively. In addition, high-throughput screening of the preparation parameters of C=O and pyridinic N active sites was achieved. For C=O active site, O% and heating rate played the most significant roles, followed by N% and pyrolysis temperature. Meanwhile, C% and Fe% had moderate effects, with pyrolysis time contributing the least to the overall impact. Additionally, the optimal pyrolysis conditions for C=O active sites were identified as pyrolysis temperature of 500–800 ℃, heating rate of 10–20 ℃/min, and heating time of 120–200 min. For pyridinic N active site, C% and pyrolysis temperature dominated, followed by N% and holding time. Meanwhile, O% and Fe% presented moderate effects, with heating rate contributing to the least. The optimal pyrolysis conditions for pyridinic N were determined to be pyrolysis temperature of 400–600 ℃, heating rate of 5–10 ℃/min, and pyrolysis time of around 150 min. A solid foundation was laid for the precise preparation of highly active sludge-derived carbon catalysts.

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

CRediT authorship contribution statement

Xu He: Writing – review & editing, Writing – original draft, Methodology, Formal analysis, Conceptualization. Wenjie Gao: Writing – original draft, Formal analysis. Jinglei Xu: Writing – review & editing, Writing – original draft. Zhanjun Cheng: Writing – review & editing. Wenchao Peng: Writing – original draft. Beibei Yan: Writing – review & editing. Guanyi Chen: Writing – original draft, Conceptualization. Ning Li: Writing – review & editing, Writing – original draft, Formal analysis.

Acknowledgments

This work was supported by the Young Scientific and Technological Talents (Level Two) in Tianjin (No. QN20230214), Climbing Program of Tianjin University (No. 2023XPD-0006), National Natural Science Foundation of China (No. 52100156) and National Engineering Research Center for Digital Construction and Evaluation Technology of Urban Rail Transit (No. 2023HJ02) for the financial support.

Supplementary materials

Supplementary material associated with this article can be found, in the online version, at doi:10.1016/j.cclet.2025.111019.
1. [1]
  Y. Duan, N. Gao, A.T. Sipra, et al., J. Hazard. Mater. 424 (2022) 127293. doi: 10.1016/j.jhazmat.2021.127293
2. [2]
  Y. Li, H. Yu, L. Liu, et al., J. Hazard. Mater. 420 (2021) 126655. doi: 10.1016/j.jhazmat.2021.126655
3. [3]
  W. Zheng, Y. Shao, S. Qin, Sustainability 16 (2024) 6710. doi: 10.3390/su16166710
4. [4]
  S.S.A. Syed-Hassan, Y. Wang, S. Hu, S. Su, et al., Renew. Sustain. Energy. Rev. 80 (2017) 888–913. doi: 10.1016/j.rser.2017.05.262
5. [5]
  E. Agrafioti, G. Bouras, D. Kalderis, et al., J. Anal. Appl. Pyrolysis. 101 (2013) 72–78. doi: 10.1016/j.jaap.2013.02.010
6. [6]
  J. Lu, Q. Lu, L. Di, et al., Chin. Chem. Lett. 34 (2023) 108357. doi: 10.1016/j.cclet.2023.108357
7. [7]
  J. Xu, Y. Yu, K. Ding, et al., Water. Sci. Technol. 77 (2018) 1410–1417. doi: 10.2166/wst.2018.001
8. [8]
  Z.X. Xu, H. Song, P.J. Li, et al., J. Hazard. Mater. 398 (2020) 122833. doi: 10.1016/j.jhazmat.2020.122833
9. [9]
  W. Hu, J. Tan, G. Pan, et al., Sci. Total. Environ. 728 (2020) 138853. doi: 10.1016/j.scitotenv.2020.138853
10. [10]
  Y. Wang, W. Peng, J. Wang, Appl. Catal. B 310 (2022) 121342. doi: 10.1016/j.apcatb.2022.121342
11. [11]
  P. Zhang, Y. Yang, X. Duan, ACS Catal. 11 (2021) 11129–11159. doi: 10.1021/acscatal.1c03099
12. [12]
  H. Zhang, Z. Yan, J. Wan, Colloids. Surf. A Eng. Asp. 654 (2022) 130174. doi: 10.1016/j.colsurfa.2022.130174
13. [13]
  N. Li, J. Ye, H. Dai, Water. Res. 235 (2023) 119926. doi: 10.1016/j.watres.2023.119926
14. [14]
  C. Zhang, N. Ding, Y. Pan, et al., Chin. Chem. Lett. 35 (2024) 109579. doi: 10.1016/j.cclet.2024.109579
15. [15]
  J. Wang, Z. Liao, J. Ifthikar, et al., Chemosphere 185 (2017) 754–763. doi: 10.1016/j.chemosphere.2017.07.084
16. [16]
  S. Singh, V. Kumar, D.S. Dhanjal, et al., J. Clean. Prod. 269 (2020) 122259. doi: 10.1016/j.jclepro.2020.122259
17. [17]
  X. Pei, X. Peng, X. Jia, et al., J. Hazard. Mater. 419 (2021) 126446. doi: 10.1016/j.jhazmat.2021.126446
18. [18]
  L. Kou, J. Wang, L. Zhao, et al., Chem. Eng. J. 411 (2021) 128459. doi: 10.1016/j.cej.2021.128459
19. [19]
  M.M. Mian, G. Liu, H. Zhou, et al., Sci. Total. Environ. 744 (2020) 140862. doi: 10.1016/j.scitotenv.2020.140862
20. [20]
  Z. Liu, S. Singer, Y. Tong, et al., Renew. Sustain. Energy. Rev. 90 (2018) 151174.
21. [21]
  Y. Liu, S. Yang, S. Liu, et al., J. Anal. Appl. Pyrolysis 182 (2024) 106696. doi: 10.1016/j.jaap.2024.106696
22. [22]
  Y. Liu, S. He, B. Huang, et al., J. Energy. Chem. 70 (2022) 511–520. doi: 10.1016/j.jechem.2022.03.005
23. [23]
  J.H. Kim, J.K. Shin, H. Lee, et al., Water. Res. 207 (2021) 117821. doi: 10.1016/j.watres.2021.117821
24. [24]
  S. Russo, M.D. Besmer, F. Blumensaat, et al., Water. Res. 206 (2021) 117695. doi: 10.1016/j.watres.2021.117695
25. [25]
  Z. Li, S. Zou, Z. Wang, et al., Chin. Chem. Lett. 35 (2024) 110526.
26. [26]
  J. Li, X. Liu, H. Wang, et al., Chin. Chem. Lett. 35 (2024) 108596. doi: 10.1016/j.cclet.2023.108596
27. [27]
  W. Gao, N. Li, Z. Cheng, et al., Bioresour. Technol. 408 (2024) 131156. doi: 10.1016/j.biortech.2024.131156
28. [28]
  R. Wang, S. Zhang, H. Chen, et al., Environ. Sci. Technol. 57 (2023) 4050–4059. doi: 10.1021/acs.est.2c07073
29. [29]
  M.A. Ganaie, M. Hu, A.K. Malik, et al., Eng. Appl. Artif. Intell. 115 (2022) 105151. doi: 10.1016/j.engappai.2022.105151
30. [30]
  X. Dong, Z. Yu, W. Cao, et al., Front. Comput. Sci. 14 (2020) 241–258. doi: 10.1007/s11704-019-8208-z
31. [31]
  B. Deng, P. Chen, P. Xie, et al., Chem. Eng. Sci. 267 (2023) 118368. doi: 10.1016/j.ces.2022.118368
32. [32]
  D. Wu, D. Zhang, S. Liu, et al., Chem. Eng. J. 399 (2020) 125878. doi: 10.1016/j.cej.2020.125878
33. [33]
  A. Smith, A. Keane, J.A. Dumesic, et al., Appl. Catal. B 263 (2020) 118257. doi: 10.1016/j.apcatb.2019.118257
34. [34]
  F. Rodrigues, N. Ortelli, M. Bierlaire, et al., IEEE. Trans. Intell. Transp. Syst. 23 (2022) 3126–3136. doi: 10.1109/tits.2020.3031965
35. [35]
  S. Mangalathu, S.H. Hwang, J.S. Jeon, et al., Eng. Struct. 219 (2020) 110927. doi: 10.1016/j.engstruct.2020.110927
36. [36]
  A. Altmann, L. Toloşi, O. Sander, et al., Bioinformatics 26 (2010) 1340–1347. doi: 10.1093/bioinformatics/btq134
37. [37]
  F. Zhang, X. Yang, Remote. Sens. Environ. 251 (2020) 112105. doi: 10.1016/j.rse.2020.112105
38. [38]
  G. Louppe, L. Wehenkel, A. Sutera, P. Geurts, Understanding variable importances in forests of randomized trees, in: B. Schólkopf, J. Platt (Eds. ), Advances in Neural Information Processing Systems, MIT Press., Cambridge, 2013, pp. 312–320.
39. [39]
  P. Antwi, J. Li, J. Meng, Bioresour. Technol. 257 (2018) 102–112. doi: 10.1016/j.biortech.2018.02.071
40. [40]
  O.B. Ayodele, H.S. Auta, N.M. Nor, Ind. Eng. Chem. Res. 23 (2022) 3126–3136.
41. [41]
  F.L. Gewers, G.R. Ferreira, H.F. De Arruda, ACM. Comput. Surv. 23 (2022) 3126–3136.
42. [42]
  S. Zhong, Y. Zhang, H. Zhang, Environ. Sci. Technol. 56 (2022) 681–692. doi: 10.1021/acs.est.1c04883
43. [43]
  Q. Lu, S. Tian, L. Wei, Sci. Total. Environ. 856 (2023) 159171. doi: 10.1016/j.scitotenv.2022.159171
44. [44]
  R. Huang, C. Ma, J. Ma, Water. Res. 205 (2021) 117666. doi: 10.1016/j.watres.2021.117666
45. [45]
  X. Wang, Y. Jin, S. Schmitt, ACM Comput. Surv. 55 (2023) 1–36.
46. [46]
  Z. Li, Comput. Environ. Urban. Syst. 96 (2022) 101845. doi: 10.1016/j.compenvurbsys.2022.101845
47. [47]
  Y. Yu, N. Li, C. Wang, J. Colloid. Interface. Sci. 619 (2022) 267–279. doi: 10.3390/en16010267
48. [48]
  R. Andrade, M. Vieira, M. Silva, Chem. Eng. J. 23 (2022) 3126–3136.
Figure 1 Pearson correlation matrix between any two variables in the dataset.

下载: 全尺寸图片幻灯片

Figure 2 R² values of C=O and pyridine N predicted by different machine learning: (a) C=O training set, (b) pyridine N training set, (c) C=O test set, (d) pyridine N test set; RMSE values of different machine learning algorithms for predicting C=O and pyridine N: (e) C=O training set, (f) pyridine N training set, (g) C=O test set, (h) pyridine N test set.

下载: 全尺寸图片幻灯片

Figure 3 R² of the machine learning algorithm after Bayesian optimization on the test data set: (a) C=O and (b) pyridine N active sites.

下载: 全尺寸图片幻灯片

Figure 4 Feature importance: (a) C=O, (b) pyridine N; and SHAP values: (c) C=O, (d) pyridine N.

下载: 全尺寸图片幻灯片

Figure 5 Partial dependence diagram of C=O and pyridine N active sites (a) C=O and pyrolysis temperature, (b) C=O and heating rate, (c) C=O and holding time, (d) pyridine N and pyrolysis temperature, (e) pyridine N and heating rate, (f) pyridine N and holding time.

下载: 全尺寸图片幻灯片

Figure 6 Measured and ML predicted contents of active sites: (a) C=O and (b) pyridinic N.

下载: 全尺寸图片幻灯片

Table 1. Effects of different pretreatment methods.

Preprocessing methods RMSE R²

Origin 0.1543 0.60

Standardizing 0.1412 0.62

PCA 0.1774 0.56

下载: 导出CSV