一种适用于流式大数据系统测试的数据生成方法
    点此下载全文
引用本文:曹旭峰,江国华.一种适用于流式大数据系统测试的数据生成方法[J].计算技术与自动化,2017,(3):139-145
摘要点击次数: 875
全文下载次数: 0
作者单位
曹旭峰,江国华 (南京航空航天大学 计算机科学与技术学院江苏 南京 210016) 
中文摘要:在流式大数据系统测试过程中,测试数据集越真实,得到的测试报告越可信。然而真实大量的流式数据并不容易获取,因此需要一种方法能够产生大量符合真实场景特征的数据。这些特征包括数据属性相关性、数据时序相关性、数据流的流速变化等等。在流式大数据环境下,数据的时序相关性与流速变化尤为重要。本文提出了一种适用于流式大数据系统测试的数据生成方法,以真实场景的数据集作为种子数据,对种子数据采用最大互信息系数描述数据属性间的相关性,改进了Prim算法对属性列集合进行分组,在尽量保证属性列强相关的前提下提高生成效率,接着提出了一种时序模型选择策略,保证生成的数据在时序上的相关性,提出了双层滑动窗口的方法控制流数据输出速度。最后,本文比较了提出的方法与其他流数据生成方法的生成效率。
中文关键词:流式大数据生成  非线性相关性  时序相关性  流速控制
 
A Data Generationmethod for Streaming Big Data System Testing
Abstract:In the process of streaming big data system testing,the more real test data sets,the more reliable the test report can be obtained.However,real data is not easy to obtain,so a method is needed to generate a large number of data with real scenario features.Thesefeatures include data attribute correlation,data temporal sequence correlation and the rates of streaming data.In the streaming big data environment,the data temporal sequence correlation and the rates of streaming dataare especially important.In this paper,we present amethod forstreaming big data generation,using real scenario streaming data as the seed data,using the maximum mutual information coefficient to describe the correlation between the data attributes,putting forward ac-prim algorithm to partition the attribute group,improve efficiency in the premise of ensuring that the attributes arestrong related.according to the different characteristics of each attribute group,using different temporal sequence model to ensure that the data generated hold temporal sequence correlation,a double sliding window method is proposed to control thedegree of parallelism and the output speed of the streaming data.Finally,this paper compares the proposed method with other streaming data generation methods for generating efficiency.
keywords:streaming data generation  nonlinear correlation  temporal sequence correlation  velocity control
查看全文   查看/发表评论   下载pdf阅读器