引用本文:董 骏?覮.面向数据集的ST-SNE算法高维数据降维研究[J].计算技术与自动化,2018,(4):116-122
董 骏?覮 (宁夏财经职业技术学院 信息与智能工程系宁夏 银川 750021) 
中文摘要:在数据挖掘和机器学习等领域中,数据降维是解决高维数据分析与处理难题的有效手段。对t-SNE 降维算法进行了深入的研究,并对t-SNE 计算高维空间样本点相似度的过程进行了改进。t-SNE 算法直接利用样本点在高维空间中的欧氏距离来度量样本点的相似度,但欧氏距离在高维空间中不能忠实反映样本位于非线性流形上的相似关系。利用样本点在高维空间中的邻居结构,提出使用二阶邻近距离来度量样本点的相似度,并提出基于二阶邻近距离的随机近邻嵌入算法(Second Order t-SNE,ST-SNE)。在MNIST、USPS、COIL-20等多个数据集上进行了对比实验。实验结果表明,改进后的算法提升了降维结果的分类准确度和可视化效果。
中文关键词:数据降维  二阶邻近距离  ST-SNE
High-dimensional Dimensionality Reduction of Data in ST-SNE Algorithm for Database
Abstract:In data mining and machine learning, dimensionality reduction is an effective way to solve the problems in high-dimensional data analysis. The t-SNE algorithm is studied,and improves the process of evaluating the similarity between data points in high-dimensional space. t-SNE directly uses the Euclidean distance in high-dimensional space to measure the similarity of data points. However, the Euclidean distance can’t faithfully reflect data structure on non-linear manifolds in high-dimensional space. This paper uses the neighborhood structure of data points in high-dimensional space and proposes using second order neighbor distance to measure the similarity of data points. Based on second order neighbor distance, this paper proposes the Second Order t-SNE (ST-SNE). ST-SNE is set on a comparative experiment with t-SNE on MNIST, USPS, COIL-20 and other data sets. The experiment shows that ST-SNE improves the classification accuracy and visualization effect of the results.
keywords:dimensionality reduction  second order neighbor distance  ST-SNE
