蔡巍,孙广宇,杨飞,等. 基于网格与域质心权重的自适应k-means聚类GDCW-AKM算法[J]. 油气储运,2025,44(5):1−9.
引用本文: 蔡巍,孙广宇,杨飞,等. 基于网格与域质心权重的自适应k-means聚类GDCW-AKM算法[J]. 油气储运,2025,44(5):1−9.
CAI Wei, SUN Guangyu, YANG Fei, et al. Adaptive k-means clustering algorithm based on grid and domain centroid weight[J]. Oil & Gas Storage and Transportation, 2025, 44(5): 1−9.
Citation: CAI Wei, SUN Guangyu, YANG Fei, et al. Adaptive k-means clustering algorithm based on grid and domain centroid weight[J]. Oil & Gas Storage and Transportation, 2025, 44(5): 1−9.

基于网格与域质心权重的自适应k-means聚类GDCW-AKM算法

Adaptive k-means clustering algorithm based on grid and domain centroid weight

  • 摘要:
    目的 根据物理实体构建的天然气管道数字孪生体会源源不断的产生大量数据,数据挖掘面临着与计算成本、效率及可扩展性相关的巨大挑战。k-means聚类算法是应用最广泛的数据挖掘方法之一,但是需要指定簇数k,且初始质心的选择决定了聚类的计算效率与是否会陷入局部最优,这导致传统k-means算法处理大数据通常需要数分钟甚至更久,同时聚类结果不唯一,需要多次运行取最优结果。
    方法 为满足管道数字孪生体对实时数据挖掘的需求,提出一种基于网格与域质心权重的自适应k-means聚类GDCW-AKM算法:①将数据集划分到大小一致的网格空间里,用网格空间的中心位置代替这个空间,空间内数据的数量作为权重赋予该点,得到样本集与权重集;②再次将样本集划分成多个域,把每个域的质心作为样本,域内数据的权重之和作为新的权重,得到新的样本集与权重集;③为每个样本赋予密度ρ与距离δ,令 r=\delta \sqrt\rho ,根据r值对样本进行降序排序;④设置多组k值,选择前kr对应的样本作为初始质心,聚类结果的方差比准则最大值对应的k与初始质心即为最佳簇数与最佳初始质心,该次聚类结果即最终结果。
    结果 将GDCW-AKM算法应用于山东天然气管道公司某站场上,通过运行时间和多种内部评价指标评估了该算法的性能。结果表明,该算法能够准确识别站场管道历史运行数据的工况数量,并能确切的将新增数据分配到所属工况类别,且在聚类精度达到k-means算法的99%以上的情况下后续计算效率大幅提高,数据量达到20×104时效率提高近12倍,数据量越大效率提升越明显。
    结论 基于网格与域质心权重的自适应k-means聚类GDCW-AKM算法能充分满足站场管道数字孪生体对算法效率与精度的要求,可在站场大力推广应用。

     

    Abstract:
    Objective The digital twin of a natural gas pipeline, constructed from physical entities, continuously generates substantial data, posing significant challenges for data mining regarding computational cost, efficiency, and scalability. The k-means clustering algorithm, a widely used data mining method, requires specifying the number of clusters, k, and relies on the selection of initial centroids which impacts computational efficiency and the risk of getting stuck in local optima. Consequently, the traditional k-means algorithm often takes several minutes or longer to process large datasets, producing non-unique results that require multiple operations to obtain the optimal results.
    Methods To meet the real-time data mining requirements of pipeline digital twins, an adaptive k-means clustering algorithm based on grid and domain centroid weight (GDCW-AKM) was proposed. The algorithm operates as follows: (1) Divide the dataset into evenly sized grid spaces, replacing each space with its center position and assigning weights based on the number of data within that space to obtain the sample set and weight set. (2) Further segment the sample set into multiple domains, using the centroid of each domain as a sample and the sum of weights as the new weight, resulting in an updated sample set and weight set. (3) Assign density (ρ) and distance (δ) to each sample, let r=\delta \sqrt\rho , and sort the samples in descending order based on the value of r. (4) Set multiple values for k, selecting the first k samples corresponding to r as initial centroids. The k that yields the maximum variance ratio criterion for the clustering results, along with the initial centroids, determines the optimal number of clusters and the optimal initial centroids. This clustering result is then considered final.
    Results The GDCW-AKM algorithm was applied to a station of Shandong Natural Gas Pipeline Co., Ltd., with its performance evaluated based on runtime and various internal indexes. Results demonstrated that the algorithm effectively identified the number of operating conditions for historical pipeline operation data and accurately assigned new data into corresponding operating condition categories. Furthermore, when the clustering accuracy exceeded 99% compared to the k-means algorithm, the computational efficiency was significantly improved. For a dataset size of 20×104, the efficiency was increased nearly 12-fold, with larger datasets yielding even greater efficiency gains.
    Conclusion The GDCW-AKM algorithm effectively meets the efficiency and accuracy requirements of station pipeline digital twins, making it highly suitable for widespread promotion and application in stations.

     

/

返回文章
返回