基于Bloom Filter的去重方法研究
    点此下载全文
引用本文:赵艳红,李洪奇,朱丽萍,詹坤林.基于Bloom Filter的去重方法研究[J].计算技术与自动化,2016,(1):95-100
摘要点击次数: 1437
全文下载次数: 26
作者单位
赵艳红,李洪奇,朱丽萍,詹坤林 (1.中国石油大学(北京) 石油数据挖掘北京市重点实验室,北京1022492.腾讯科技(北京)网络媒体事业部北京100080) 
中文摘要:在个性化新闻推荐系统中,文章去重是一个重要的模块,避免了同一篇文章被重复推荐的现象。在海量用户场景下,采用传统的基于队列的去重方法将会消耗大量的内存。Bloom Filter是一种空间效率很高的随机数据结构,适用于允许有一定误判率的场景。本文基于Bloom Filter,设计双Bloom Filter位数组结构和Bloom Filter位数组链结构。实验证明,基于Bloom Filter位数组链的去重方法,不仅大大降低了程序对服务器内存要求,而且具有较好的灵活性和扩展性。
中文关键词:信息超载  个性化推荐系统  Bloom Filter
 
Research on Duplicated News Deletion Method Based on Bloom Filter
Abstract:In personalization news recommendation system,duplicated news deletion is an important part, which prevents the same news from being repeatedly recommended to users. Facing a large amount of users, the traditional duplicated news deletion method will consume a great deal of memory. Bloom Filter is a random data structure with high space efficiency and is used in the situations which allows false positive rate. In this paper, based on the bloom filter, we successively designed the double bit vector structure and the bit vector list structure for duplicated news deletion. The experimental results show that, with the benefit of the bit vector list structure, it not only greatly reduce the memory requirements, but also has better flexibility and expansibility.
keywords:information overload  personalization recommendation system  Bloom Filter
查看全文   查看/发表评论   下载pdf阅读器