一种改进的CLIQUE算法及其并行化实现
    点此下载全文
引用本文:林鹏1,2?覮,陈曦1,2,龙鹏飞1,2,傅明1,2.一种改进的CLIQUE算法及其并行化实现[J].计算技术与自动化,2018,(4):49-54
摘要点击次数: 713
全文下载次数: 0
作者单位
林鹏1,2?覮,陈曦1,2,龙鹏飞1,2,傅明1,2 (1.长沙理工大学 综合交通运输大数据智能处理湖南省重点实验室 湖南 长沙 410114 2. 长沙理工大学 计算机与通信工程学院湖南 长沙 410114) 
中文摘要:CLIQUE算法是一种高效的聚类算法,但其聚类结果存在锯齿边界的问题。而且随着数据规模和维度的增加,算法的效率受到极大影响。针对这些问题,提出一种改进的CLIQUE算法,算法首先使用边界修正方法和滑动网格方法,对稠密区域的边界和稀疏区域进行扫描,寻回被剪枝的稠密网格,提升网格划分的质量;然后实现了改进算法在MapReduce下的分布式并行化,并通过实验验证了算法的性能。实验结果表明,改进后的并行算法的聚类准确率提高了17%~26%,同时有效地减少了处理海量数据的运行时间,具有良好的扩展性。
中文关键词:边界修正方法  滑动网格方法  CLIQUE算法  MapReduce
 
Improved CLIQUE Algorithm and its Parallelization
Abstract:CLIQUE is an efficient algorithm. But its clustering result is defective with the serrated boundary.And with the increase of data size and dimension,the efficiency of the algorithm has been greatly affected. This paper proposes an improved CLIQUE algorithm.The algorithm firstly uses the boundary-correcting method and grid-sliding method to improve the quality of meshing by Scanning the dense area border and sparse area and then retrieving the pruned dense grid.Then the parallelization of the improved algorithm is achieved on top of MapReduce.A series of experiments are carried out and the clustering accuracy,processing time,speedup and scalability of the improved algorithm are tested.The result of experiments proves that the algorithm is improved 17% to 26% in accuracy.The parallel algorithm decreases the runtime effectively in massive data processing,which shows excellent attribute in scalability.
keywords:boundary-correcting method  grid-sliding method  CLIQUE  MapReduce
查看全文   查看/发表评论   下载pdf阅读器