山东师范大学学报(自然科学版)

2016, 04, v.31;No.136 60-65

SVM训练数据归一化研究

1.山东师范大学信息科学与工程学院 2.山东师范大学山东省分布式计算机软件新技术重点实验室 3.山东师范大学实验室与设备管理处

基金项目(Foundation):

邮箱(Email):

DOI:

1,740	197	173
下载次数	被引频次	阅读次数

引用本文下载本文

PDF

引用导出

GB/T 7714-2015 MLA APA Refworks EndNote NoteExpress NoteFirst

摘要全文参考文献出版信息相关文章

摘要：

数据归一化是训练支持向量机(SVM)必须的数据预处理过程.常用的归一化方法有[-1,+1]、N(0,1)等方法,但现有文献尚未发现关于这些常用归一化方法科学依据方面的研究.本文以经验性的实验对数据归一化的理由、归一化与不归一化对训练效率和模型预测能力影响等方面开展研究.论文选择标准数据集,对原始未归一化、不同方法归一化、人工逆归一化、任选数据属性列等情况下的数据分别进行了SVM训练,并记录目标函数值随迭代次数的变化、训练时间、模型测试及k-CV性能等信息.实验结果表明,将数据值限制在常规范围内的归一化方法,如[-0.5,+0.5]~[-5,+5]、N(0,1)^N(0,5)等均能在训练时间最短的情况下获得最佳的预测模型.本文工作为SVM以及一般机器学习算法的数据归一化提供了科学依据.

关键词： 支持向量机; 数据归一化; 数据预处理; 交叉验证;

Abstract：

Data normalization is a necessary training support vector machine( SVM) to the process of data preprocessing. The normalization method commonly used contains [-1, + 1 ],N( 0,1),etc. However,the existing literature has not yet been found on the research of these commonly used normalization methods of scientific basis. This paper carries out research based on empirical experiments on data normalization,training efficiency and model prediction effect of normalization and non-normalization,etc. Standard data set being selected,this paper analyzed the original non-normalized data,data normalized by different method,artificial inverse normalization and optional attribute of the data by SVM training,recorded changes of objective function values with the number of iterations,training time,model test and k-CV performance information,etc. The experimental results show that the normalization method of limiting the data in the conventional range,such as [-0. 5,+ 0. 5]to [-5,+ 5],N( 0,1) ~ N( 0,5) can obtain the best predictive model in the case of short training time. This paper provides a scientific basis for the normalization of SVM data and learning algorithm of general machine.

KeyWords： SVM; SMO; data normalization; data pre-processing; cross validation;

如需获取全文，请访问cnki.net

参考文献

[1]刘洛霞.基于SVM的多变量函数回归分析研究[J].电光与控制,2013,20(6):50-57.

[2]段会川.高斯核函数支持向量分类机超级参数有效范围研究[D].济南:山东师范大学,2012.

[3]王成波,李勇平,等.SVM与归一化方法结合的人脸和指纹融合识别[J].微计算机信息,2009,25(2-1):235-237.

[4]李秦渝.SVM入侵检测系统中数据预处理方法改进[J].交通科技与经济,2009,3:94-95.

[5]覃华,徐燕子.用LDL-T并行分解优化大规模SVM的训练效率[J].计算机工程与研究,2011,47(12):200-212.

[6]Mosteller F.A k-sample slippage test for an extreme population[J].Annals of Mathematical Statistics,1948,19:58-65.

[7]Platt J C.Fast training of support vector machines using sequential minimal optimization[C].in Advances in Kernel Methods-Support Vector Learning,Eds.,Cambridge,MA:MIT Press,1998:185-208.

[8]Ronan Collobert,Samy Bengio.Support vector machines for large-scale regression problems[J].The Journal of Machine Learning Research,2001,1:143-160.

[9]Chang C C,Lin C J.LIBSVM:A library for support vector machines[J].ACM Transactions on Intelligent Systems and Technology(TIST),2011,2(3):1-27

[10]Vapnik V N.The Nature of Statistical Learning Theory[M].张学工,译.统计学习理论的本质,北京:清华大学出版社,2000.

[11]Vapnik V N,Lerner A.Pattern recognition using generalized portrait method[J].Automation and Remote Control,1963,24:774-780.

基本信息:

DOI：

中图分类号:TP181

引用信息:

[1]汤荣志,段会川,孙海涛.SVM训练数据归一化研究[J].山东师范大学学报(自然科学版),2016,31(04):60-65.

基金信息:

请选择需要下载的pdf数据

山东师范大学学报(自然科学版)

Summary

引用

GB/T 7714-2015 格式引文

MLA格式引文

APA格式引文