Abstract:
Due to the high data confidentiality, imperfect data acquisition technology and infrequent abnormal working conditions of oil and gas pipeline systems, it is impossible for the machine learning models to obtain the desired training effect with the available pipeline transportation data set. Herein, the energy consumption of pipeline transportation was analyzed based on a crude oil pipeline, and the power consumption of oil pump set thereof was simulated with Pipeline Studio TLNET to expand the data. Given the characteristics of simulation samples for pipeline transportation, such as no real value control, feature correlation, and high dimension, the Density-Based Spatial Clustering of Applications with Noise (DBSCAN algorithm) based on Mahalanobis distance was proposed to evaluate the reliability of simulation samples and identify the abnormal simulation samples. As shown by the examples, the fitting capability of the model can be improved after the simulation samples with the abnormal data eliminated are added to the training set. Generally, the research results provide a new idea for the generation and verification of simulation samples of the pipeline transportation data.