基于递归神经网络的端到端语音识别
投稿时间:2019-01-14  修订日期:2019-01-14  点此下载全文
引用本文:
摘要点击次数: 154
全文下载次数: 0
作者单位邮编
王子龙 国家电网有限公司 100031
李俊峰 国家电网有限公司营销部 
张劭韡* 国家电网有限公司客户服务中心 300306
王宏岩 北京中电普华信息技术有限公司 
王思杰 国家电网有限公司客户服务中心 
中文摘要:本文提出了一种直接利用文本对音频数据进行转录的语音识别系统。采用基于深度双向长短期记忆(LSTM)的递归神经网络(RNN)结构和连接时间分类(CTC)目标函数相结合。引入了对目标函数的修正方法,进而使得训练网络对任意转录损失函数的期望最小化。即使在没有词典或语言模型的情况下,也可直接优化单词错误率。该系统在没有语言信息情况下,对《华尔街日报》语料库实现了27.3%的单词错误率(WER),在只有允许单词词典的情况下达到了21.9%,在三元语言模型下达到了8.2%。将所提方法与基准系统结合,进一步将错误率降低到6.7%。
中文关键词:递归神经网络  语音识别  长短期记忆  连接时间分类  单词错误率
 
End-to-End Speech Recognition based on Recurrent Neural Network
Abstract:This paper presents a speech recognition system that transcribes audio data directly from text. A recursive neural network (RNN) structure based on deep bidirectional long-term and short-term memory (LSTM) is combined with the objective function of connection time classification (CTC). The objective function is modified to minimize the expectation of the training network for any transcription loss function. Even in the absence of dictionaries or language models, word error rates can be directly optimized. In the absence of language information, the system achieves 27.3% word error rate (WER) for the Wall Street Journal corpus, 21.9% under the condition of only allowing word dictionaries, and 8.2% under the ternary language model. By combining the proposed method with the benchmark system, the error rate is further reduced to 6.7%.
keywords:RNN  speech recognition  LSTM  CTC  WER
查看全文   查看/发表评论   下载pdf阅读器