数据密集型计算环境下的离群点挖掘算法
    点此下载全文
引用本文:陈亚丽,张龙波,张树森.数据密集型计算环境下的离群点挖掘算法[J].计算技术与自动化,2015,(2):74-77
摘要点击次数: 1693
全文下载次数: 54
作者单位
陈亚丽,张龙波,张树森 (山东理工大学 计算机科学与技术学院 , 山东 淄博255000) 
中文摘要:在数据密集型计算环境中,数据的海量、高维、分布存储等特点,为数据挖掘算法的设计与实现带来了新的挑战。基于MapReduce模型提出网格技术与基于密度的方法相结合的离群点挖掘算法,该算法分为两步:Map阶段采用网格技术删除大量不可能成为离群点的正常数据,将代表点信息发送给主节点;Reduce阶段采用基于密度的聚类方法,通过改进其核心对象选取,可以挖掘任意形状的离群点。实验结果表明,在数据密集型计算环境中,该方法能有效的对离群点进行挖掘。
中文关键词:离群点检测  网格  MapReduce  MR_DBScan
 
Outlier Mining Algorithm for Data-intensive Computing Environments
Abstract:The characteristics of data, such as huge amounts, high dimension and distributed storage etc, have brought new challenges for the design of outlier mining algorithm. This paper proposed a grid and density based outlier mining method on account of MapReduce. It is divided into two steps: in the Map phase, which deletes a large number of normal data, then sends the representative information to the master node. During the Reduce phase, it uses clustering algorithm based on density and simplifies the selection of the core. It can detect any shapes of outliers. The experimental results show that, in data-intensive computing environments, the algorithm is effective for mining outliers.
keywords:outlier detection  grid  MapReduce  MR_DBScan
查看全文   查看/发表评论   下载pdf阅读器