基于双流网络融合与时空卷积的人体行为识别

秦  悦; 石跃祥

基于双流网络融合与时空卷积的人体行为识别

引用本文：秦悦，石跃祥.基于双流网络融合与时空卷积的人体行为识别[J].计算技术与自动化,2021,(2):140-147

摘要点击次数: 788

全文下载次数: 0

作者	单位
秦悦，石跃祥	（湘潭大学计算机学院网络空间安全学院，湖南湘潭 411105）

中文摘要:针对视频中存在噪音，无法更好地获取特征信息，造成动作识别不精准的问题。提出了一种基于时空卷积神经网络的人体行为识别网络。将长时段视频进行分段处理，分别把RGB图片和计算出的光流图输入到两个卷积神经网络（CNN）中，使用权重相加的融合算法将提取的时域特征和空域特征融合成时空特征。形成的中层语义信息输入到R（2+1）D的卷积中，利用ResNet提高网络性能，最后在softmax层进行行行为识别。在UCF-101和HMDB-51数据集上进行实验，获得了92.1%和66.1%的准确率。实验表明，提出的双流融合与时空卷积网络模型有助于视频行为识别的准确率提高。

中文关键词:深度学习时空卷积网络双流融合网络 R(2+1)D

Human Behavior Recognition Based on Dual-stream Network Fusion and Spatio-temporal Convolution

Abstract:In view of the noise in the video, it is impossible to better obtain the characteristic information, which causes the problem of inaccurate motion recognition. This paper proposes a human behavior recognition network based on spatio-temporal convolutional neural networks. The long-term video is segmented, and the RGB pictures and the calculated optical flow map are input into two convolutional neural networks (CNN), and the extracted time-domain features and spatial-domain features are fused using a fusion algorithm of weight addition. Into space-time characteristics. The formed middle-layer semantic information is input into the convolution of R(2+1)D, the network performance is improved by using ResNet, and the behavior recognition is performed at the softmax layer. Experiments on UCF-101 and HMDB-51 datasets have obtained 92.1% and 66.1% accuracy. Experiments show that the dual-stream fusion and spatio-temporal convolutional network model proposed in this paper can help improve the accuracy of video behavior recognition.

keywords:deep learning spatio-temporal convolutional network two-stream convolutional networks R(2+1)D

查看全文 查看/发表评论 下载pdf阅读器