面向油田领域的中文真词错误自动校对方法研究
    点此下载全文
引用本文:王 辉1 ,Marius. Petrescu2,潘俊辉1, 王浩畅1,张 强1,张 岩1.面向油田领域的中文真词错误自动校对方法研究[J].计算技术与自动化,2021,(1):140-143
摘要点击次数: 104
全文下载次数: 0
作者单位
王 辉1 ,Marius. Petrescu2,潘俊辉1, 王浩畅1,张 强1,张 岩1 (1. 东北石油大学 计算机与信息技术学院黑龙江 大庆 1633182. 普罗莱斯蒂石油天然气大学罗马尼亚 什蒂 100680) 
中文摘要:中文真词错误自动校对是自然语言理解的一项重要的基础研究课题,油田数字化过程中利用图像识别及人工录入产生的中文真词错误会直接影响后期数据综合分析准确度。对中文真词错误成因和统计语言模型进行分析,提出一种面向油田领域的中文真词错误自动校对方法。该方法首先构建通用领域和油田领域混淆集,再引入同义词集丰富知识库,对语料分词后,综合统计分析目标词与混淆词、周边词的同义词之间关系,自动校对真词错误。实验表明, 提出的方法能有效校对油田领域的中文真词错误。
中文关键词:真词错误  N-gram  文本自动校对  知识库构建
 
Research on Chinese Real-word Error Automatic Proofreading For Oilfield
Abstract:Automatic proofreading of Chinese real-word errors is an important and basic research issus in NLP, in the process of oil field digitization, Chinese real-word errors generated by image recognition and manual input, which will directly affect the accuracy of later data comprehensive analysis. This paper analyzes the cause of Chinese real-word errors and statistical language models, and proposes an automatic proofreading method of Chinese real-word errors for oilfield. First the confusion sets of general domain and oilfield domain are constructed, then the knowledge base is enriched by adding synonyms set. After word segmentation, the relationship between target word and synonyms words of confosed words and peripheral words was analyzed by comprehensive statistics, real-word errors are automatically checked. Experimental results show that the method proposed can effectively proofread Chinese real-word errors in oilfield.
keywords:real-word error  N-gram  text automatic proofreading  knowledge base construction
查看全文   查看/发表评论   下载pdf阅读器