张曦月, 胡瑾秋, 张来斌, 董绍华, 徐康凯. 基于CW-AGNES的油气储运企业事故风险因素文本[J]. 油气储运, 2021, 40(11): 1242-1249. DOI: 10.6047/j.issn.1000-8241.2021.11.006
引用本文: 张曦月, 胡瑾秋, 张来斌, 董绍华, 徐康凯. 基于CW-AGNES的油气储运企业事故风险因素文本[J]. 油气储运, 2021, 40(11): 1242-1249. DOI: 10.6047/j.issn.1000-8241.2021.11.006
ZHANG Xiyue, HU Jinqiu, ZHANG Laibin, DONG Shaohua, XU Kangkai. Textual generalization method of accident risk factors in oil & gas storage and transportation enterprises based on CW-AGNES[J]. Oil & Gas Storage and Transportation, 2021, 40(11): 1242-1249. DOI: 10.6047/j.issn.1000-8241.2021.11.006
Citation: ZHANG Xiyue, HU Jinqiu, ZHANG Laibin, DONG Shaohua, XU Kangkai. Textual generalization method of accident risk factors in oil & gas storage and transportation enterprises based on CW-AGNES[J]. Oil & Gas Storage and Transportation, 2021, 40(11): 1242-1249. DOI: 10.6047/j.issn.1000-8241.2021.11.006

基于CW-AGNES的油气储运企业事故风险因素文本

Textual generalization method of accident risk factors in oil & gas storage and transportation enterprises based on CW-AGNES

  • 摘要: 事故风险因素文本泛化是建立油气储运企业事故风险因素演化知识图谱的重要步骤。为解决现有事件文本泛化方法对油气储运企业生产过程中积累的风险因素文本泛化时的语义表征局限性以及存在的分词误差问题,针对油气储运企业安全管理文本语言表达复杂多变的特点,提出基于字词特征-凝聚层次聚类(Char-Word Feature Based AGNES,CW-AGNES)的事故风险因素文本泛化方法。利用Word2Vec方法获取油气储运企业事故的字特征与二元词特征向量;根据预训练词向量模型对油气储运企业事故风险因素文本进行向量化表示;在凝聚层次聚类方法的基础上加入文本的字词特征,在保留词语语义信息的基础上减少由于分词带来的误差,实现风险因素文本的泛化。在真实油气储运企业安全管理文本上对CW-AGNES方法进行应用,并与其他泛化方法进行对比,结果表明:该方法的泛化效果更好,分别在AMI、ARI、V-measure及FMI量化评估指标上提高了2.44%~5.74%,可为油气储运领域事故风险知识图谱构建研究提供支持。

     

    Abstract: The textual generalization of accident risk factors is an important step to establish the knowledge graph of accident risk factors of the oil & gas storage and transportation enterprises. In order to solve the problem of semantic representation limitations and word segmentation errors for the textual generalization of risk factors accumulated in the production process of oil & gas storage and transportation enterprises by existing event text generalization methods, a textual generalization method of accident risk factors based on the Char-Word feature based AGNES (CW-AGNES) was put forward according to the complicated and changeable text expression of safety management. Definitely, the character feature and binary word feature vectors of the oil & gas storage and transportation enterprises were obtained by Word2vec method. The text of accident risk factors is vectorized according to the pre-trained word vector model. Then, the char-word features of the text are added with the agglomerative nesting method, and the error caused by word segmentation can be reduced on the basis of retaining the semantic information of the words, so as to realize the generalization of the risk factor text. Specifically, the CW-AGNES method was applied to the actual safety management texts of the oil & gas storage and transportation enterprises. Meanwhile, comparison was made with other generalization methods. The results show that: The CW-AGNES method has a better generalization effect with 2.44%–5.74% improvement in quantitative evaluation indicators such as AMI, ARI, V-Measure and FMI. Therefore, the proposed method could provide support for the construction of accident risk knowledge graph in the field of oil & gas storage and transportation.

     

/

返回文章
返回