Citation:
WANG Zhi-Ming, HAN Na, YUAN Zhe-Ming, WU Zhao-Hua. Feature Selection for High-Dimensional Data Based on Ridge Regression and SVM and Its Application in Peptide QSAR Modeling[J]. Acta Physico-Chimica Sinica,
;2013, 29(03): 498-507.
doi:
10.3866/PKU.WHXB201301042
-
Absolute weight values estimated from test data by ridge regression (RR) can reflect the significance of corresponding features. Based on RR and support vector machine (SVM), a new feature selection al rithm for high-dimensional data is proposed. Examples from bitter tasting thresholds (BTT) and cytotoxic T lymphocyte (CTL) epitopes are presented. All 531 physicochemical property parameters were employed to express each residue of one peptide, thus 1062 and 4779 descriptors were obtained for BTT and CTL, respectively. Each sample was divided into training and test sets, and weight estimates of all training set descriptors were generated by RR. According to the descending order of the weights, corresponding features were gradually selected until the mean square error (MSE) of leave-one-out cross validation (LOOCV) increased significantly. Based on smaller training datasets obtained from the previous step, the reserved features were available from multiple elimination rounds. 7 and 18 descriptors were selected by the new method for BTT and CTL, respectively. A quantitative structure-activity relationship (QSAR) model based on support vector regression (SVR) was established on extracted data with the reserved descriptors, and was then used for test data prediction. The fitting, LOOCV, and external prediction accuracies were significantly improved with respect to reported literature values. Because of the calculation speed, clear physicochemical meaning, and ease of interpretation, the new method is widely applicable to regression forecasting of high-dimensional data such as QSAR modeling of peptide or proteins.
-
-
-
[1]
(1) Ding, J. L.; Ho, B. Drug Dev. Res. 2004, 62 (4), 317.
-
[2]
(2) Anfinsen, C. B.; Haber, E.; Sela, M.; White, F. H., Jr. Proc.Natl. Acad. Sci . U. S. A. 1961, 47, 1309. doi: 10.1073/pnas.47.9.1309
-
[3]
(3) Sneath, P. H. J. Theor. Biol. 1966, 12 (2), 157. doi: 10.1016/0022-5193(66)90112-3
-
[4]
(4) Kidera, A.; Konishi, Y.; Oka, M.; Ooi, T.; Scheraga, H. A.J. Protein Chem. 1985, 4 (1), 23. doi: 10.1007/BF01025492
-
[5]
(5) Hellberg, S.; Eriksson, L.; Jonsson, J.; Lindgren, F.; Sjöström,M.; Skagerberg, B.;Wold, S.; Andrews, P. Int. J. Pept. ProteinRes. 1991, 37 (5), 414.
-
[6]
(6) Sandberg, M.; Eriksson, L.; Jonsson, J.; Sjöström, M.;Wold, S.J. Med. Chem. 1998, 41 (14), 2481. doi: 10.1021/jm9700575
-
[7]
(7) Liang, G. Z.; Mei, H.; Zhou, P.; Zhou, Y.; Li, Z. L. ActaPhys. -Chim. Sin. 2006, 22, 388. [梁桂兆, 梅虎, 周鹏,周原, 李志良. 物理化学学报, 2006, 22, 388.] doi: 10.3866/PKU.WHXB20060327
-
[8]
(8) Liang, G. Z.; Zhou, P.; Zhou, Y.; Zhang, Q. X.; Li, Z. L. ActaChim. Sin. 2006, 64 (5), 393. [梁桂兆, 周鹏, 周原, 张巧霞, 李志良. 化学学报, 2006, 64 (5), 393.]
-
[9]
(9) Zhou, Y.; Mei, H.; Yang, L.; Zhou, P.; Yang, S. B.; Li, Z. L.Chem. J. Chin. Univ. 2007, 28 (7), 1263. [周原, 梅虎,杨力, 周鹏, 杨善斌, 李志良. 高等学校化学学报, 2007,28 (7), 1263.]
-
[10]
(10) Yang, S. B.; Xia, Z. N.; Shu, M.; Mei, H.; Lü, F. L.; Zhang, M.;Wu, Y. Q.; Li, Z. L. Chem. J. Chin. Univ. 2008, 29 (11), 2213.[杨善彬, 夏之宁, 舒茂, 梅虎, 吕凤林, 张梅, 吴玉乾,李志良. 高等学校化学学报, 2008, 29 (11), 2213.]
-
[11]
(11) Li, Z. L.; Li, G. R.; Shu, M.; Sun, J. Y.; Yang, S. B.; Mei, H.;Zhang, M. J.; Zhou, P.;Wu, S. R.; Chen, G. H.; Lü, F. L.; Lü, T.T. Sci. China Ser. B: Chem. 2008, 38 (8), 745. [李志良, 李根容, 舒茂, 孙家英, 杨善斌, 梅虎, 张梦军, 周萍, 吴世荣,陈国华, 吕凤林, 吕廷亭. 中国科学B 辑: 化学, 2008, 38 (8),745.]
-
[12]
(12) Kawashima, S.; Pokarowski, P.; Pokarowska, M.; Kolinski, A.;Katayama, T.; Kanehisa, M. Nucl. Acids Res. 2008, 36 (1),D202.
-
[13]
(13) Dash, M.; Liu, H. Intell. Data Anal. 1997, 1 (3), 131.
-
[14]
(14) lub, T. R.; Slonim, D. K.; Tamayo, P.; Huard, C.;Gaasenbeek, M.; Mesirov, J. P.; Coller, H.; Loh, M. L.;Downing, J. R.; Caligiuri, M. A.; Bloomfield, C. D.; Lander, E.S. Science 1999, 286 (5439), 531. doi: 10.1126/science.286.5439.531
-
[15]
(15) Kononerko, I. Estimating Attributes: Analysis and Extension ofRelief. In Lecture Notes in Computer Science, Proceedings ofEuropean Conference on Machine Learning, Catania, Italy,April 6-8, 1994; Bergadano, F., Raedt, L. D. Eds.; Springer:Heidelberg, 1994; pp 171-182.
-
[16]
(16) Liu, H.; Setiono, R. A Probabilistic Approach to FeatureSelection-a Filter Solution. In Machine Learning, Proceedingsof the Thirteenth International Conference on MachineLearning, Bari, Italy, July 3-6, 1996; Saitta, L. Ed.; MorganKaufmann: San Fransisco, 1996; pp 319-327.
-
[17]
(17) Kohavi, R.; John, G. H. Artif. Intel. 1997, 97 (1-2), 273.doi: 10.1016/S0004-3702(97)00043-X
-
[18]
(18) Destrero, A.; Mosci, S.; De Mol, C.; Verri, A.; Odone, F.Comput. Manag. Sci. 2008, 6 (1), 25.
-
[19]
(19) Vapnik, V. N. The Nature of Statistical Learning Theory;Springer-Verlag: New York, 1995; pp 87-189.
-
[20]
(20) Hoerl, A. E.; Kennard, R.W. Technometrics 1970, 12, 55.doi: 10.1080/00401706.1970.10488634
-
[21]
(21) Tan, X. S.; Yuan, Z. M.; Zhou, T. J.;Wang, C. J.; Xiong, J. Y.Chem. J. Chin. Univ. 2008, 29 (1), 95. [谭显胜, 袁哲明, 周铁军, 王春娟, 熊洁仪. 高等学校化学学报, 2008, 29 (1), 95.]
-
[22]
(22) Chang, C. C.; Lin, C. J. ACM TIST 2011, 2 (3), 1.
-
[23]
(23) Tropsha, A.; Gramatica, P.; mbar, V. K. QSAR Comb. Sci.2003, 22 (1), 69.
-
[24]
(24) Cocchi, M.; Johansson, E. Quant. Struct. -Act. Relat. 1993, 12 (1), 1. doi: 10.1002/qsar.v12:1
-
[25]
(25) Collantes, E. R.; Dunn,W. J., III. J. Med. Chem. 1995, 38 (14),2705. doi: 10.1021/jm00014a022
-
[26]
(26) Mei, H.; Liang, G. Z.; Zhou, Y.; Li, Z. L. Chin. Sci. Bull. 2005,50 (16), 1703. [梅虎, 梁桂兆, 周原, 李志良. 科学通报,2005, 50 (16), 1703.] doi: 10.1360/982005-58
-
[27]
(27) Mei, H.; Zhou, Y.; Sun, L. L.; Li, Z. L. Chemistry 2005, (7),534. [梅虎, 周原, 孙立力, 李志良. 化学通报, 2005, (7),534.] doi: 10.3870/zgzzhx.2012.01.022
-
[28]
(28) Liang, G. Z. Construction of Representation Techniques andInvestigation on Structure-Activity Relationship for BiologicalSequences. Ph. D. Dissertation, Chongqing University,Chongqing, 2007. [梁桂兆. 生物序列表征体系构建及结构与功能关系研究[D]. 重庆: 重庆大学, 2007.]
-
[29]
(29) Tan, X. S.;Wang, Z. M.; Tan, S. Q.; Yuan, Z. M.; Xiong, X. Y.Journal of System Simulation 2009, 21 (24), 7795. [谭显胜,王志明, 谭泗桥, 袁哲明, 熊兴耀. 系统仿真学报, 2009, 21 (24), 7795.]
-
[30]
(30) Meek, J. L. Proc. Natl. Acad. Sci. U. S. A. 1980, 77 (3), 1632.doi: 10.1073/pnas.77.3.1632
-
[31]
(31) Harpaz, Y.; Gerstein, M.; Chothia, C. Structure 1994, 2 (7), 641.doi: 10.1016/S0969-2126(00)00065-4
-
[32]
(32) Chothia, C. Nature 1975, 254 (5498), 304. doi: 10.1038/254304a0
-
[33]
(33) Rackovsky, S.; Scheraga, H. A. Macromolecules 1982, 15 (5),1340. doi: 10.1021/ma00233a025
-
[34]
(34) Robson, B.; Suzuki, E. J. Mol. Biol. 1976, 107 (3), 327. doi: 10.1016/S0022-2836(76)80008-3
-
[35]
(35) Parker, J. M. R.; Guo, D.; Hodges, R. S. Biochemistry 1986, 25 (19), 5425. doi: 10.1021/bi00367a013
-
[36]
(36) Bundi, A.;Wüthrich, K. Biopolymers 1979, 18 (2), 285.
-
[37]
(37) Mei, H.; Zhou, Y.; Liao, Z. H.; Li, Z. L. Acta Chim. Sin. 2006,64 (9), 949. [梅虎, 周原, 廖志华, 李志良. 化学学报,2006, 64 (9), 949.]
-
[38]
(38) Frahm, N.; Korber, B. T.; Adams, C. M.; Szinger, J. J.; Draenert,R.; Addo, M. M.; Feeney, M. E.; Yusim, K.; San , K.; Brown,N. V.; SenGupta, D.; Piechocka-Trocha, A.; Simonis, T.;Marincola, F. M.;Wurcel, A. G.; Stone, D. R.; Russell, C. J.;Adolf, P.; Cohen, D.; Roach, T.; StJohn, A.; Khatri, A.; Davis,K.; Mullins, J.; ulder, P. J. R.;Walker, B. D.; Brander, C.J. Virol. 2004, 78 (5), 2187. doi: 10.1128/JVI.78.5.2187-2200.2004
-
[39]
(39) Doytchinova, I. A.; Flower, D. R. J. Med. Chem. 2001, 44,3572. doi: 10.1021/jm010021j
-
[40]
(40) Liang, G. Z.; Li, S. Z. Biopolymers 2007, 88 (3), 401. doi: 10.1002/bip.v88:3
-
[41]
(41) Levitt, M. J. Mol. Biol. 1976, 104, 59. doi: 10.1016/0022-2836(76)90004-8
-
[42]
(42) Tsai, J.; Taylor, R.; Chothia, C.; Gerstein, M. J. Mol. Biol. 1999,290 (1), 253. doi: 10.1006/jmbi.1999.2829
-
[43]
(43) Biou, V.; Gibrat, J. F.; Levin, J. M.; Robson, B.; Garnier, J.Protein Eng. 1988, 2 (3), 185. doi: 10.1093/protein/2.3.185
-
[44]
(44) Schwartz, R.; Istrail, S.; King, J. Protein Science 2001, 10 (5),1023.
-
[45]
(45) Sueki, M.; Lee, S.; Powers, S. P.; Denton, J. B.; Konishi, Y.;Scheraga, H. A. Macromolecules 1984, 17 (2), 148. doi: 10.1021/ma00132a006
-
[46]
(46) Chothia, C. Nature 1974, 248, 338. doi: 10.1038/248338a0
-
[47]
(47) Naderi-Manesh, H.; Sadeghi, M.; Arab, S.; Moosavi Movahedi,A. A. Proteins 2001, 42 (4), 452. doi: 10.1002/1097-0134(20010301)42:4<>1.0.CO;2-N
-
[1]
-
-
-
[1]
Xue-Peng Zhang , Yuchi Long , Yushu Pan , Jiding Wang , Baoyu Bai , Rui Ding . 定量构效关系方法学习探索:以钴卟啉活化氧气为例. University Chemistry, 2025, 40(8): 345-359. doi: 10.12461/PKU.DXHX202410107
-
[2]
Shihui Shi , Haoyu Li , Shaojie Han , Yifan Yao , Siqi Liu . Regioselectively Synthesis of Halogenated Arenes via Self-Assembly and Synergistic Catalysis Strategy. University Chemistry, 2024, 39(5): 336-344. doi: 10.3866/PKU.DXHX202312002
-
[3]
Feifei Yang , Wei Zhou , Chaoran Yang , Tianyu Zhang , Yanqiang Huang . Enhanced Methanol Selectivity in CO2 Hydrogenation by Decoration of K on MoS2 Catalyst. Acta Physico-Chimica Sinica, 2024, 40(7): 2308017-0. doi: 10.3866/PKU.WHXB202308017
-
[4]
Peiran ZHAO , Yuqian LIU , Cheng HE , Chunying DUAN . A functionalized Eu3+ metal-organic framework for selective fluorescent detection of pyrene. Chinese Journal of Inorganic Chemistry, 2024, 40(4): 713-724. doi: 10.11862/CJIC.20230355
-
[5]
Zhongyan Cao , Shengnan Jin , Yuxia Wang , Yiyi Chen , Xianqiang Kong , Yuanqing Xu . Advances in Highly Selective Reactions Involving Phenol Derivatives as Aryl Radical Precursors. University Chemistry, 2025, 40(4): 245-252. doi: 10.12461/PKU.DXHX202405186
-
[6]
Shuhong Xiang , Lv Yang , Yingsheng Xu , Guoxin Cao , Hongjian Zhou . Selective electrosorption of Cs(Ⅰ) from high-salinity radioactive wastewater using CNT-interspersed potassium zinc ferrocyanide electrodes. Acta Physico-Chimica Sinica, 2025, 41(9): 100097-0. doi: 10.1016/j.actphy.2025.100097
-
[7]
Jingkun Yu , Xue Yong , Ang Cao , Siyu Lu . Bi-Layer Single Atom Catalysts Boosted Nitrate-to-Ammonia Electroreduction with High Activity and Selectivity. Acta Physico-Chimica Sinica, 2024, 40(6): 2307015-0. doi: 10.3866/PKU.WHXB202307015
-
[8]
Xin Feng , Kexin Guo , Chunguang Jia , Bowen Liu , Suqin Ci , Junxiang Chen , Zhenhai Wen . Hydrogen Generation Coupling with High-Selectivity Electrocatalytic Glycerol Valorization into Formate in an Acid-Alkali Dual-Electrolyte Flow Electrolyzer. Acta Physico-Chimica Sinica, 2024, 40(5): 2303050-0. doi: 10.3866/PKU.WHXB202303050
-
[9]
Jun LUO , Baoshu LIU , Yunchang ZHANG , Bingkai WANG , Beibei GUO , Lan SHE , Tianheng CHEN . Europium(Ⅲ) metal-organic framework as a fluorescent probe for selectively and sensitively sensing Pb2+ in aqueous solution. Chinese Journal of Inorganic Chemistry, 2024, 40(12): 2438-2444. doi: 10.11862/CJIC.20240240
-
[10]
Jing SU , Bingrong LI , Yiyan BAI , Wenjuan JI , Haiying YANG , Zhefeng Fan . Highly sensitive electrochemical dopamine sensor based on a highly stable In-based metal-organic framework with amino-enriched pores. Chinese Journal of Inorganic Chemistry, 2024, 40(7): 1337-1346. doi: 10.11862/CJIC.20230414
-
[11]
Yuanpei ZHANG , Jiahong WANG , Jinming HUANG , Zhi HU . Preparation of magnetic mesoporous carbon loaded nano zero-valent iron for removal of Cr(Ⅲ) organic complexes from high-salt wastewater. Chinese Journal of Inorganic Chemistry, 2024, 40(9): 1731-1742. doi: 10.11862/CJIC.20240077
-
[12]
Ling Zhang , Jing Kang . Turn Waste into Valuable: Preparation of High-Strength Water-Based Adhesives from Polymethylmethacrylate Wastes: a Comprehensive Chemical Experiments. University Chemistry, 2024, 39(2): 221-226. doi: 10.3866/PKU.DXHX202306075
-
[13]
Zhi Chai , Huashan Huang , Xukai Shi , Yujing Lan , Zhentao Yuan , Hong Yan . Wittig反应的立体选择性. University Chemistry, 2025, 40(8): 192-201. doi: 10.12461/PKU.DXHX202410046
-
[14]
Wenliang Wang , Weina Wang , Lixia Feng , Nan Wei , Sufan Wang , Tian Sheng , Tao Zhou . Proof and Interpretation of Severe Spectroscopic Selection Rules. University Chemistry, 2025, 40(3): 415-424. doi: 10.12461/PKU.DXHX202408063
-
[15]
Yunhao Zhang , Yinuo Wang , Siran Wang , Dazhen Xu . Progress in Selective Construction of Functional Aromatics from Nitrogenous Cycloalkanes. University Chemistry, 2024, 39(11): 136-145. doi: 10.3866/PKU.DXHX202401083
-
[16]
Zongpei Zhang , Yanyang Li , Yanan Si , Kai Li , Shuangquan Zang . Developing a Chemistry Experiment Center Employing a Multifaceted Approach to Serve High-Quality Laboratory Education. University Chemistry, 2024, 39(7): 13-19. doi: 10.12461/PKU.DXHX202404041
-
[17]
Jiakun BAI , Ting XU , Lu ZHANG , Jiang PENG , Yuqiang LI , Junhui JIA . A red-emitting fluorescent probe with a large Stokes shift for selective detection of hypochlorous acid. Chinese Journal of Inorganic Chemistry, 2024, 40(6): 1095-1104. doi: 10.11862/CJIC.20240002
-
[18]
Xilin Zhao , Xingyu Tu , Zongxuan Li , Rui Dong , Bo Jiang , Zhiwei Miao . Research Progress in Enantioselective Synthesis of Axial Chiral Compounds. University Chemistry, 2024, 39(11): 158-173. doi: 10.12461/PKU.DXHX202403106
-
[19]
.
CCS Chemistry | 超分子活化底物为自由基促进高效选择性光催化氧化
. CCS Chemistry, 2025, 7(10.31635/ccschem.025.202405229): -. -
[20]
Ruilan Fan , Xiaoling Huang . 磷源的选择及三种含磷阻燃剂的合成与阻燃性. University Chemistry, 2025, 40(8): 181-191. doi: 10.12461/PKU.DXHX202410025
-
[1]
Metrics
- PDF Downloads(745)
- Abstract views(1292)
- HTML views(48)