Abstract:A major issue in text categorization is how to improve classification accuracy. In order to improve the classification accuracy, a new weighting method TF-IDF-IF based on TF-IDF is proposed. This method introduces a new parameter to represent in-class features, which is used to calculate the frequency of terms in a document in a class. The experiment uses the CHI chi-square statistical feature selection method to select 1000 features in the data set, and then use TF-IDF, TF-IDF-CF, LTC, and TFC methods respectively in some commonly used classifiers such as Nave Bayes, Bayesian networks, KNN, SVM experiments. From the experimental results, this method can achieve good results. |