一种基于HDFS小文件存储优化方案
    点此下载全文
引用本文:张晓丽  ,滑亚慧.一种基于HDFS小文件存储优化方案[J].计算技术与自动化,2017,(3):134-138
摘要点击次数: 1423
全文下载次数: 0
作者单位
张晓丽  ,滑亚慧 (西安航空学院 计算机学院陕西 西安 710077) 
中文摘要:Hadoop分布式文件系统( HDFS) 在大数据存储中具有优良的性能,适用于处理和存储大文件,但在海量小文件处理时性能显著下降,过多的小文件使得整个系统内存消耗过大。为了提高HDFS处理小文件的效率,改进了HDFS的存储方案,提出了海量小文件的存储优化方案。根据小文件之间的相关性进行分类,然后将同一类小文件合并上传,并生成索引文件,读取时采用客户端缓存机制以提高访问效率。实验结果表明,该方案在数据迅速增长的情况下能有效提高小文件访问效率,降低系统内存开销,提高HDFS处理海量小文件的性能。
中文关键词:Hadoop  HDFS  小文件  缓存  
 
A Small Files Optimized Schema Based on HDFS
Abstract:The Hadoop distributed file system (HDFS) has excellent performance in the big data storage and is suitable for processing and storing big files,but when processing the mass small files the performance reduced significantly,too many small files consume excessive amount of memory.In order to improve the efficiency of processing small files in HDFS,this paper improved the HDFS storage solution,and proposed an optimization scheme.First,it Classified the small files according to the correlation,a set of correlated files is combined into a large file then stored in HDFS,and generate the index file,using client-side caching mechanism to improve the efficiency of access.The experimental results show that the proposed scheme can improve the store and access efficiency effectively with rapiding growth of small files,and reduce memory consumption,improve the performance of processing mass small files.
keywords:Hadoop  HDFS  small file  cache
查看全文   查看/发表评论   下载pdf阅读器