仿真与DBSCAN算法融合的管输数据生成与验证方法

张鑫儒; 侯磊; 徐磊; 黄亚楠; 白小众; 满建峰; 刘金海; 谷文渊

doi:10.6047/j.issn.1000-8241.2022.02.003

仿真与DBSCAN算法融合的管输数据生成与验证方法

Generation and verification method of pipeline transportation data based on integration of simulation and DBSCAN algorithm

摘要

摘要: 在油气管道系统中，受数据保密性高、数据采集技术不完善、异常工况发生频率低等因素制约，利用管输数据集进行机器学习模型训练，效果不理想。基于此，以某原油管道为例，分析管输能耗，利用Pipeline Studio TLNET软件对输油泵机组耗电量进行仿真，扩充训练数据集。针对管输仿真样本无真实值对照、特征关联、高维等特点，提出一种基于马氏距离的DBSCAN（Density-Based Spatial Clustering of Applications with Noise）算法，用于评价仿真样本的可靠度，识别异常仿真数据。基于仿真样本与现场数据样本的机器学习模型训练结果表明，剔除异常数据的仿真样本能够提升模型的拟合能力，由此为管输数据仿真样本的生成与验证提供了新的思路。

Abstract: Due to the high data confidentiality, imperfect data acquisition technology and infrequent abnormal working conditions of oil and gas pipeline systems, it is impossible for the machine learning models to obtain the desired training effect with the available pipeline transportation data set. Herein, the energy consumption of pipeline transportation was analyzed based on a crude oil pipeline, and the power consumption of oil pump set thereof was simulated with Pipeline Studio TLNET to expand the data. Given the characteristics of simulation samples for pipeline transportation, such as no real value control, feature correlation, and high dimension, the Density-Based Spatial Clustering of Applications with Noise (DBSCAN algorithm) based on Mahalanobis distance was proposed to evaluate the reliability of simulation samples and identify the abnormal simulation samples. As shown by the examples, the fitting capability of the model can be improved after the simulation samples with the abnormal data eliminated are added to the training set. Generally, the research results provide a new idea for the generation and verification of simulation samples of the pipeline transportation data.

HTML全文

参考文献(25)

施引文献

资源附件(0)