快捷導(dǎo)航

利用Python將社交網(wǎng)絡(luò)進(jìn)行可視化

更新時(shí)間：2022年06月01日 17:09:46 作者：Python編程學(xué)習(xí)圈

這篇文章介紹了利用Python將社交網(wǎng)絡(luò)進(jìn)行可視化，主要是一些Python的第三方庫來進(jìn)行社交網(wǎng)絡(luò)的可視化，利用領(lǐng)英（Linkedin）的社交關(guān)系數(shù)據(jù)展開介紹，內(nèi)容可當(dāng)學(xué)習(xí)練習(xí)題有一定的參考價(jià)值，需要的小伙伴可以參考一下

我們平常會(huì)使用很多社交媒體，如微信、微博、抖音等等，在這些平臺(tái)上面，我們會(huì)關(guān)注某些KOL，同時(shí)自己身邊的親朋好友也會(huì)來關(guān)注我們，成為我們自己的粉絲。慢慢地，關(guān)注和粉絲隨著時(shí)間不斷累積，這層關(guān)系網(wǎng)絡(luò)也會(huì)不斷地壯大，很多信息也是通過這樣的關(guān)系網(wǎng)絡(luò)不斷向外傳播。因此，分析這些社交網(wǎng)絡(luò)對(duì)于我們做出各項(xiàng)決策來說也是至關(guān)重要的。

今天我們就用一些Python的第三方庫來進(jìn)行社交網(wǎng)絡(luò)的可視化

數(shù)據(jù)來源

本案例用的數(shù)據(jù)是來自領(lǐng)英（Linkedin）的社交關(guān)系數(shù)據(jù)。由于作者之前在美國讀書，并且在國外找實(shí)習(xí)、找工作，都是通過領(lǐng)英投遞簡(jiǎn)歷、聯(lián)系同事等，久而久之也逐漸地形成了自己的社交網(wǎng)絡(luò)，將這部分的社交數(shù)據(jù)下載下來，然后用pandas模塊讀取

由于涉及隱私信息，數(shù)據(jù)就不便提供了。如果你有領(lǐng)英賬號(hào)，可以通過設(shè)置里的“獲取資料副本”導(dǎo)出這樣一份CSV關(guān)系數(shù)據(jù)。或者也可以按照這個(gè)表頭自己生成一份假數(shù)據(jù)：

數(shù)據(jù)的讀取和清洗

首先導(dǎo)入需要用到的模塊：

import?pandas?as?pd
import?janitor
import?datetime

from?IPython.core.display?import?display,?HTML
from?pyvis?import?network?as?net
import?networkx?as?nx

讀取所需要用到的數(shù)據(jù)集：

df_ori?=?pd.read_csv("Connections.csv",?skiprows=3)
df_ori.head()

接下來我們進(jìn)行數(shù)據(jù)的清洗，具體的思路就是將空值去除掉，并且數(shù)據(jù)集當(dāng)中的“Connected on”這一列，內(nèi)容是日期，但是數(shù)據(jù)類型卻是字符串，因此我們也需要將其變成日期格式。

df?=?(
????df_ori
????.clean_names()?#?去除掉字符串中的空格以及大寫變成小寫
????.drop(columns=['first_name',?'last_name',?'email_address'])?#?去除掉這三列
????.dropna(subset=['company',?'position'])?#?去除掉company和position這兩列當(dāng)中的空值
????.to_datetime('connected_on',?format='%d?%b?%Y')
??)

輸出：

                    company            position connected_on
0                xxxxxxxxxx  Talent Acquisition   2021-08-15
1               xxxxxxxxxxxx   Associate Partner   2021-08-14
2                      xxxxx                獵頭顧問   2021-08-14
3  xxxxxxxxxxxxxxxxxxxxxxxxx          Consultant   2021-07-26
4    xxxxxxxxxxxxxxxxxxxxxx     Account Manager   2021-07-19

數(shù)據(jù)的分析與可視化

來看一下這些人脈中，分別都是在哪些公司工作的

df['company'].value_counts().head(10).plot(kind="barh").invert_yaxis()

輸出：

再來看一下我的人脈網(wǎng)絡(luò)中，大多都是什么職業(yè)的

df['position'].value_counts().head(10).plot(kind="barh").invert_yaxis()

輸出：

接下來我們繪制社交網(wǎng)絡(luò)的可視化圖表。但是在這之前呢，需要先說明幾個(gè)術(shù)語，每一個(gè)社交網(wǎng)絡(luò)都包含：

節(jié)點(diǎn)：社交網(wǎng)絡(luò)當(dāng)中的每個(gè)參與者
邊：代表著每一個(gè)參與者的關(guān)系以及關(guān)系的緊密程度

我們先來簡(jiǎn)單的繪制一個(gè)社交網(wǎng)絡(luò)，主要用到的是networkx模塊以及pyvis模塊，

g?=?nx.Graph()
g.add_node(0,?label?=?"root")?#?intialize?yourself?as?central?node
g.add_node(1,?label?=?"Company?1",?size=10,?title="info1")
g.add_node(2,?label?=?"Company?2",?size=40,?title="info2")
g.add_node(3,?label?=?"Company?3",?size=60,?title="info3")

我們先是建立了4個(gè)節(jié)點(diǎn)，也分別給他們命名，其中的參數(shù)size代表著節(jié)點(diǎn)的大小，然后我們將這些個(gè)節(jié)點(diǎn)相連接

g.add_edge(0,?1)
g.add_edge(0,?2)
g.add_edge(0,?3)

最后出來的樣子如下圖：

我們先從人脈中，他們所屬的公司來進(jìn)行網(wǎng)絡(luò)的可視化，首先我們對(duì)所屬的公司做一個(gè)統(tǒng)計(jì)排序

df_company?=?df['company'].value_counts().reset_index()
df_company.columns?=?['company',?'count']
df_company?=?df_company.sort_values(by="count",?ascending=False)
df_company.head(10)

輸出：

                            company  count
0                            Amazon     xx
1                            Google     xx
2                          Facebook     xx
3   Stevens Institute of Technology     xx
4                         Microsoft     xx
5              JPMorgan Chase & Co.     xx
6         Amazon Web Services (AWS)     xx
9                             Apple      x
10                    Goldman Sachs      x
8                            Oracle      x

然后我們來繪制社交網(wǎng)絡(luò)的圖表：

#?實(shí)例化網(wǎng)絡(luò)
g?=?nx.Graph()
g.add_node('myself')?#?將自己放置在網(wǎng)絡(luò)的中心

#?遍歷數(shù)據(jù)集當(dāng)中的每一行
for?_,?row?in?df_company_reduced.iterrows():

????#?將公司名和統(tǒng)計(jì)結(jié)果賦值給新的變量
????company?=?row['company']
????count?=?row['count']

????title?=?f"<b>{company}</b>?–?{count}"
????positions?=?set([x?for?x?in?df[company?==?df['company']]['position']])
????positions?=?''.join('<li>{}</li>'.format(x)?for?x?in?positions)

????position_list?=?f"<ul>{positions}</ul>"
????hover_info?=?title?+?position_list

????g.add_node(company,?size=count*2,?title=hover_info,?color='#3449eb')
????g.add_edge('root',?company,?color='grey')

#?生成網(wǎng)絡(luò)圖表
nt?=?net.Network(height='700px',?width='700px',?bgcolor="black",?font_color='white')
nt.from_nx(g)
nt.hrepulsion()

nt.show('company_graph.html')
display(HTML('company_graph.html'))

輸出：

同樣，我們?cè)賮砜梢暬幌氯嗣}中各種崗位的分布。

先做一個(gè)統(tǒng)計(jì)排序：

df_position?=?df['position'].value_counts().reset_index()
df_position.columns?=?['position',?'count']
df_position?=?df_position.sort_values(by="count",?ascending=False)
df_position.head(10)

輸出：

                           position  count
0                 Software Engineer     xx
1                    Data Scientist     xx
2          Senior Software Engineer     xx
3                      Data Analyst     xx
4             Senior Data Scientist     xx
5     Software Development Engineer     xx
6  Software Development Engineer II     xx
7                           Founder     xx
8                     Data Engineer     xx
9                  Business Analyst     xx

然后進(jìn)行網(wǎng)絡(luò)圖的繪制

g?=?nx.Graph()
g.add_node('myself')?#?將自己放置在網(wǎng)絡(luò)的中心

for?_,?row?in?df_position_reduced.iterrows():

????#?將崗位名和統(tǒng)計(jì)結(jié)果賦值給新的變量
????position?=?row['position']
????count?=?row['count']

????title?=?f"<b>{position}</b>?–?{count}"
????positions?=?set([x?for?x?in?df[position?==?df['position']]['position']])
????positions?=?''.join('<li>{}</li>'.format(x)?for?x?in?positions)

????position_list?=?f"<ul>{positions}</ul>"
????hover_info?=?title?+?position_list

????g.add_node(position,?size=count*2,?title=hover_info,?color='#3449eb')
????g.add_edge('root',?position,?color='grey')

#?生成網(wǎng)絡(luò)圖表
nt?=?net.Network(height='700px',?width='700px',?bgcolor="black",?font_color='white')
nt.from_nx(g)
nt.hrepulsion()
nt.show('position_graph.html')

輸出：