基于平均差异度的改进k-prototypes聚类算法
投稿时间:2017-11-20  修订日期:2017-12-20  点此下载全文
引用本文:
摘要点击次数: 306
全文下载次数: 0
作者单位邮编
石鸿雁 沈阳工业大学 110870
徐明明* 沈阳工业大学 110870
基金项目:国家自然科学基金项目(面上项目,重点项目,重大项目)
中文摘要:针对k-prototypes聚类算法随机选取初始聚类中心导致聚类结果不稳定,及现有的大多数混合属性数据聚类算法聚类质量不高等问题,提出了基于平均差异度的改进k-prototypes聚类算法,该算法利用平均差异度选取初始聚类中心,避免了选择初始中心点的随机性。为了进一步提高算法效率,利用信息熵确定数值数据的属性权重,并针对传统的分类属性度量公式不能完全体现出数据对象与类之间的差异,对分类属性度量公式进行改进,给出了一种混合属性数据度量公式。最后在真实数据集上对该算法进行了仿真实验,结果表明:改进后的算法具有较高的准确率,能够有效处理混合属性数据。
中文关键词:聚类  初始聚类中心  混合属性数据  平均差异度  信息熵
 
Improved K-prototypes Clustering Algorithm Based on Average Difference Degree
Abstract:Aiming at an unstable result caused from the k-prototypes clustering algorithm randomly selecting the initial clustering centers and the accuracy of most existing clustering algorithms for mixed attributes data is not high enough as desired, an improved k-prototypes algorithm based on average difference degree is proposed, which uses the average difference degree level to select the initial clustering center , avoiding the randomness of selecting the initial clustering center. In order to further improve the efficiency of the algorithm, the attribute weights of numerical data using information entropy, and according to the classification of traditional attribute measure formula cannot fully reflect the difference between the data object and the class, the measurement formula of classification attribute is improved and the measurement formula of a mixed attribute data is given. The simulation experimental results on real data sets show that the improved algorithm can achieve better accuracy, and can deal with the numerical and categorical data effectively.
keywords:clustering  initial clustering center  mixed attribute data  average difference  information entropy
查看全文   查看/发表评论   下载pdf阅读器