基于改进遗传算法的支持向量机微信垃圾文章识别
    点此下载全文
引用本文:梁阔洋.基于改进遗传算法的支持向量机微信垃圾文章识别[J].计算技术与自动化,2015,(4):137-141
摘要点击次数: 1150
全文下载次数: 39
作者单位
梁阔洋 (东北石油大学 计算机与信息技术学院黑龙江 大庆163318) 
中文摘要:近几年,随着微信的快速发展和普及,微信已经成为智能移动设备必备的应用之一,但与之同时也出现了大量微信诈骗信息、垃圾广告等,给人们带来了极大的困扰。本文将从搜狗微信搜索中抽取微信文章样本,将微信垃圾文章识别看做文本分类问题,采用支持向量机对样本进行分类模型的训练,并应用改进的遗传算法对支持向量机的参数进行优化。文中详细的介绍了改进遗传算法在支持向量机上的应用,相比传统的支持向量机,采用改进遗传算法对支持向量机参数进行优化,提升了模型准确率和优化效率。在文章的最后进行了由15000篇微信文章所形成的测试集上的分类模型效果实验,实现结果表明,本方法能够达到94.7%的准确率,非常准确的识别微信垃圾文章。
中文关键词:支持向量机  遗传算法  特征选择  参数优化  垃圾文章
 
Recognition of Spam in Wechat Based on the Support Vector Machine with Improving Genetic Algorithm
Abstract:In recent years, along with the rapid development and popularization of Wechat, it becomes one of the essential applications on smart mobile device. Meanwhile, it brings tremendous troubles that a large number of swindling messages and rubbish ads on Wechat appeared. Extracting Wechat articles from Sogou & Wechat search as samples, this paper regards the recognition of spam in Wechat as a question of text classification, uses the support vector machine to do the disaggregated model training of samples, and applies the improving genetic algorithm to optimize parameters on support vector machine. The author introduces particularly the application of improving genetic algorithm on the support vector machine. Comparing to traditional support vector machine, support vector machine with improving genetic algorithm could improve the accuracy rate of model and its optimization efficiency. Finally, this paper conducts the classification model experiment of which test set is constituted of 15000 articles on Wechat. The result shows accuracy rate of this method could reach to 94.7% which is accurate extremely to recognize spam articles on Wechat.
keywords:support vector machine  genetic algorithm  feature selection  parameter optimization  spam
查看全文   查看/发表评论   下载pdf阅读器