文本分类中TF-IDF算法的改进研究
    点此下载全文
引用本文:吴宗卓.文本分类中TF-IDF算法的改进研究[J].计算技术与自动化,2022,(2):84-86
摘要点击次数: 70
全文下载次数: 0
作者单位
吴宗卓 (陕西国防工业职业技术学院陕西 西安 710300) 
中文摘要:文本分类中的一个主要问题是如何提高分类准确性。为了提高分类准确性,提出了一种基于TF-IDF的新的加权方法TF-IDF-IF。此方法引入了一个新的参数来表示类内特征,它用来计算一个类中文档中的术语频率。实验使用CHI卡方统计特征选择方法在数据集中选择1000个特征,然后使用TF-IDF、TF-IDF-CF、LTC和TFC方法在一些常用的分类器如朴素贝叶斯、贝叶斯网络、KNN、SVM中进行实验。实验结果表明,这种方法可以取得很好的效果。
中文关键词:文本分类  特征选择  CHI平方统计  TFIDF  分类准确性
 
Research on Improvement of TF-IDF Algorithm in Text Classification
Abstract:A major issue in text categorization is how to improve classification accuracy. In order to improve the classification accuracy, a new weighting method TF-IDF-IF based on TF-IDF is proposed. This method introduces a new parameter to represent in-class features, which is used to calculate the frequency of terms in a document in a class. The experiment uses the CHI chi-square statistical feature selection method to select 1000 features in the data set, and then use TF-IDF, TF-IDF-CF, LTC, and TFC methods respectively in some commonly used classifiers such as Nave Bayes, Bayesian networks, KNN, SVM experiments. From the experimental results, this method can achieve good results.
keywords:text categorization  feature selection  CHI square statistics  TFIDF  categorization accuracy
查看全文   查看/发表评论   下载pdf阅读器