快捷導(dǎo)航

Label?Propagation算法原理示例解析

更新時(shí)間：2023年02月01日 10:53:49 作者：zhiyong_will

這篇文章主要為大家介紹了Label?Propagation算法原理示例解析，有需要的朋友可以借鑒參考下，希望能夠有所幫助，祝大家多多進(jìn)步，早日升職加薪

1. 概述

對(duì)于社區(qū)，沒(méi)有一個(gè)明確的定義，有很多對(duì)社區(qū)的定義，如社區(qū)是指在一個(gè)網(wǎng)絡(luò)中，有一組節(jié)點(diǎn)，它們彼此都相似，而組內(nèi)的節(jié)點(diǎn)與網(wǎng)絡(luò)中的其他節(jié)點(diǎn)則不相似。更為一般的可以表述為：社區(qū)是指網(wǎng)絡(luò)中節(jié)點(diǎn)的集合，這些節(jié)點(diǎn)內(nèi)部連接較為緊密而外部連接較為稀疏。

基于上述的形象的表示，出現(xiàn)了很多的社區(qū)劃分算法，如Fast Unfolding算法[1]，F(xiàn)ast Unfolding算法是基于模塊度的算法，模塊度相當(dāng)于對(duì)上述社區(qū)的形象描述的一種抽象表示，成為優(yōu)化的主要目標(biāo)。但是模塊度的計(jì)算相對(duì)較為復(fù)雜，這也成為限制基于模塊度算法在大規(guī)模圖數(shù)據(jù)上的應(yīng)用。

Label Propagation算法[2]是一種基于標(biāo)簽傳播的局部社區(qū)劃分算法，相比較而言其簡(jiǎn)單的計(jì)算過(guò)程，能夠在大規(guī)模的圖數(shù)據(jù)上應(yīng)用。

2. Label Propagation算法

2.1. Label Propagation算法概述

Label Propagation算法是一種基于標(biāo)簽傳播的局部社區(qū)劃分算法。對(duì)于網(wǎng)絡(luò)中的每一個(gè)節(jié)點(diǎn)，在初始階段，Label Propagation算法對(duì)每一個(gè)節(jié)點(diǎn)一個(gè)唯一的標(biāo)簽，在每一個(gè)迭代的過(guò)程中，每一個(gè)節(jié)點(diǎn)根據(jù)與其相連的節(jié)點(diǎn)所屬的標(biāo)簽改變自己的標(biāo)簽，更改的原則是選擇與其相連的節(jié)點(diǎn)中所屬標(biāo)簽最多的社區(qū)標(biāo)簽為自己的社區(qū)標(biāo)簽，這便是標(biāo)簽傳播的含義。隨著社區(qū)標(biāo)簽的不斷傳播，最終緊密連接的節(jié)點(diǎn)將有共同的標(biāo)簽。

Label Propagation算法最大的優(yōu)點(diǎn)是其算法過(guò)程比較簡(jiǎn)單，想比較于優(yōu)化模塊度的過(guò)程，算法速度非?？臁abel Propagation算法利用網(wǎng)絡(luò)的結(jié)構(gòu)指導(dǎo)標(biāo)簽的傳播過(guò)程，在這個(gè)過(guò)程中無(wú)需優(yōu)化任何函數(shù)。在算法開(kāi)始前我們不必要知道社區(qū)的個(gè)數(shù)，隨著算法的迭代，在最終的過(guò)程中，算法將自己決定社區(qū)的個(gè)數(shù)。

2.2. Label Propagation算法原理

對(duì)于Label Propagation算法，假設(shè)對(duì)于節(jié)點(diǎn)x，其鄰居節(jié)點(diǎn)為x1,x2,?,xk，對(duì)于每一個(gè)節(jié)點(diǎn)，都有其對(duì)應(yīng)的標(biāo)簽，標(biāo)簽代表的是該節(jié)點(diǎn)所屬的社區(qū)。在算法迭代的過(guò)程中，節(jié)點(diǎn)x根據(jù)其鄰居節(jié)點(diǎn)更新其所屬的社區(qū)。更新的規(guī)則是選擇節(jié)點(diǎn)x的鄰居節(jié)點(diǎn)中，所屬社區(qū)最多的節(jié)點(diǎn)對(duì)應(yīng)的社區(qū)為其新的社區(qū)。

上述便是Label Propagation算法的核心概念。在初始節(jié)點(diǎn)，令每一個(gè)節(jié)點(diǎn)都屬于唯一的社區(qū)，當(dāng)社區(qū)的標(biāo)簽在節(jié)點(diǎn)間傳播的過(guò)程中，緊密相連的節(jié)點(diǎn)迅速地取得一致的標(biāo)簽。具體過(guò)程如下圖所示：

這樣的過(guò)程不斷地持續(xù)下去，直到所有可能聚集到一起的節(jié)點(diǎn)都具有了相同的社區(qū)標(biāo)簽。在傳播過(guò)程的最終，具有相同社區(qū)標(biāo)簽的節(jié)點(diǎn)被劃到相同的社區(qū)中稱(chēng)為一個(gè)個(gè)獨(dú)立的社區(qū)。

在標(biāo)簽傳播的過(guò)程中，節(jié)點(diǎn)的標(biāo)簽的更新過(guò)程可以分為兩種，即：

同步更新
異步更新

同步更新是指對(duì)于節(jié)點(diǎn)xxx，在第ttt代時(shí)，根據(jù)其所有鄰居節(jié)點(diǎn)在第t−1代時(shí)的社區(qū)標(biāo)簽對(duì)其標(biāo)簽進(jìn)行更新。即：

在上圖中，兩邊的標(biāo)簽會(huì)在社區(qū)標(biāo)簽aaa和社區(qū)標(biāo)簽bbb不停地震蕩。

這樣的停止條件可以使得最終能夠獲得強(qiáng)壯的社區(qū)(Strong Community)，但是社區(qū)并不是唯一的。對(duì)于Strong Community，其要求對(duì)于每一個(gè)節(jié)點(diǎn)，在其社區(qū)內(nèi)部的鄰居節(jié)點(diǎn)嚴(yán)格大于社區(qū)外部的鄰居節(jié)點(diǎn)，然而Label Propagation算法能夠保證對(duì)于每一個(gè)節(jié)點(diǎn)，在其所屬的社區(qū)內(nèi)有足夠多的鄰居節(jié)點(diǎn)。

3.3. Label Propagation算法過(guò)程

3. 實(shí)驗(yàn)

3.1. 數(shù)據(jù)描述

實(shí)驗(yàn)過(guò)程中使用的數(shù)據(jù)為參考[1]中使用的數(shù)據(jù)，其結(jié)構(gòu)如下所示：

其在Fast Unfolding算法的劃分結(jié)果為：

社區(qū)1：節(jié)點(diǎn)0，1，2，3，4，5，6，7
社區(qū)2：節(jié)點(diǎn)8，9，10，11，12，13，14，15

3.2. 實(shí)驗(yàn)代碼

#####################################
# Author:zhaozhiyong
# Date:20151205
# Fun:Label Propagation
# Fix:已經(jīng)用Python3重新修改
#####################################
import random
def loadData(filePath):
    f = open(filePath)
    vector_dict = {}
    edge_dict = {}
    for line in f.readlines():
        lines = line.strip().split("\t")
        for i in range(2):
            if lines[i] not in vector_dict:
                #put the vector into the vector_dict
                vector_dict[lines[i]] = int(lines[i])
                #put the edges into the edge_dict
                edge_list = []
                if len(lines) == 3:
                    edge_list.append(lines[1-i]+":"+lines[2])
                else:
                    edge_list.append(lines[1-i]+":"+"1")
                edge_dict[lines[i]] = edge_list
            else:
                edge_list = edge_dict[lines[i]]
                if len(lines) == 3:
                    edge_list.append(lines[1-i]+":"+lines[2])
                else:
                    edge_list.append(lines[1-i]+":"+"1")
                edge_dict[lines[i]] = edge_list
    return vector_dict, edge_dict
def get_max_community_label(vector_dict, adjacency_node_list):
    label_dict = {}
    # generate the label_dict
    for node in adjacency_node_list:
        node_id_weight = node.strip().split(":")
        node_id = node_id_weight[0]
        node_weight = int(node_id_weight[1])
        if vector_dict[node_id] not in label_dict:
            label_dict[vector_dict[node_id]] = node_weight
        else:
            label_dict[vector_dict[node_id]] += node_weight
    # find the max label
    sort_list = sorted(label_dict.items(), key = lambda d: d[1], reverse=True)
    return sort_list[0][0]
def check(vector_dict, edge_dict):
    for node in vector_dict.keys():
        adjacency_node_list = edge_dict[node]
        node_label = vector_dict[node]
        label_check = {}
        for ad_node in adjacency_node_list:
            node_id_weight = ad_node.strip().split(":")
            node_id = node_id_weight[0]
            if vector_dict[node_id] not in label_check:
                label_check[vector_dict[node_id]] = 1
            else:
                label_check[vector_dict[node_id]] += 1
        sort_list = sorted(label_check.items(), key = lambda d: d[1], reverse=True)
        if node_label == sort_list[0][0]:
            continue
        else:
            return 0
    return 1    
def label_propagation(vector_dict, edge_dict):
    #initial, let every vector belongs to a community
    t = 0
    #for every node in a random order
    while True:
        if (check(vector_dict, edge_dict) == 0):
            t = t+1
            print("----------------------------------------")
            print("iteration: ", t)
            for node in vector_dict.keys():
                adjacency_node_list = edge_dict[node]
                vector_dict[node] = get_max_community_label(vector_dict, adjacency_node_list)
            print(vector_dict)
        else:
            break
    return vector_dict
if __name__ == "__main__":
    vector_dict, edge_dict=loadData("./cd_data.txt")
    dict_key_list = list(vector_dict.keys())
    random.shuffle(dict_key_list)
    new_vector_dict = {}
    for key in dict_key_list:
        new_vector_dict[key] = vector_dict.get(key)
    print("original community: ", new_vector_dict)
    vec_new = label_propagation(new_vector_dict, edge_dict)
    print("---------------------------------------------------------")
    print("the final result: ")
    community_dict = {}
    for key in vec_new.keys():
        if vec_new[key] not in community_dict:
            community_dict[vec_new[key]] = []
        community_dict[vec_new[key]].append(str(key))
    for key in community_dict.keys():
        print("community-" + str(key) + " ---> " + str(community_dict[key]))

3.3. 代碼解析

上述的實(shí)驗(yàn)代碼中主要包括4個(gè)部分其一為loadData()函數(shù)，其目的是從文件中讀入數(shù)據(jù)，取得點(diǎn)的信息，邊的信息；

第二個(gè)是label_propagation()函數(shù)，其目的是Label Propagation算法的整個(gè)迭代過(guò)程，期中會(huì)調(diào)用兩個(gè)函數(shù)，即get_max_community_label()函數(shù)和check()函數(shù)，get_max_community_label()函數(shù)的目的是在對(duì)每個(gè)節(jié)點(diǎn)遍歷其鄰居節(jié)點(diǎn)的過(guò)程中，選擇出最大的社區(qū)標(biāo)簽，check()函數(shù)的目的是判斷算法是否迭代結(jié)束。

4. 實(shí)驗(yàn)結(jié)果

參考文獻(xiàn)

[1] Blondel V D, Guillaume J L, Lambiotte R, et al. Fast unfolding of communities in large networks[J]. Journal of statistical mechanics: theory and experiment, 2008, 2008(10): P10008.

[2] Raghavan U N, Albert R, Kumara S. Near linear time algorithm to detect community structures in large-scale networks[J]. Physical review E, 2007, 76(3): 036106.

以上就是Label Propagation算法原理示例解析的詳細(xì)內(nèi)容，更多關(guān)于Label Propagation算法的資料請(qǐng)關(guān)注腳本之家其它相關(guān)文章！

您可能感興趣的文章: