Python根據(jù)詞頻信息(xlsx、csv文件)繪制詞云圖全過(guò)程(wordcloud)

更新時(shí)間：2024年06月25日 10:36:22 作者：十八只兔

這篇文章主要給大家介紹了關(guān)于Python根據(jù)詞頻信息(xlsx、csv文件)繪制詞云圖的相關(guān)資料,wordcloud是基于Python開發(fā)的詞云生成庫(kù),功能強(qiáng)大使用簡(jiǎn)單,文中通過(guò)代碼介紹的非常詳細(xì),需要的朋友可以參考下

一、前言

本文將介紹如何用python根據(jù)詞頻信息（xlsx、csv文件）繪制詞云圖，除了繪制常規(guī)形狀的詞云圖（比如長(zhǎng)方形），還可以指定詞云圖的形狀。

二、安裝并引入相關(guān)的庫(kù)

1、安裝相關(guān)的庫(kù)

pip install jieba
pip install matplotlib
pip install wordcloud
pip install numpy
pip install Image 
pip install pandas

2、導(dǎo)入相關(guān)的庫(kù)

import jieba
import matplotlib.pyplot as plt
from wordcloud import WordCloud
import numpy as np
from PIL import Image # 圖像處理
import pandas as pd

三、數(shù)據(jù)處理

1、文件讀取

本文使用的數(shù)據(jù)集是excel文件（后綴名是.xlsx），該文件包含2個(gè)字段：關(guān)鍵詞以及對(duì)應(yīng)的頻數(shù)
以下是對(duì)excel文件的相關(guān)操作：

import pandas as pd
df=pd.read_excel("data-test.xlsx")# 讀取excel數(shù)據(jù)信息
print(df)

數(shù)據(jù)讀取結(jié)果如下：

只讀取文件的前N條數(shù)據(jù)

# 只獲取前5條數(shù)據(jù)
df_new=df.head(5)
print(df_new)

結(jié)果如下：

2、數(shù)據(jù)格式轉(zhuǎn)換

讀取到excel文件后，需要把數(shù)據(jù)轉(zhuǎn)換成字典的格式：

# 生成一個(gè)DataFrame文件，index為df數(shù)據(jù)的index
data = pd.DataFrame(index=df['關(guān)鍵詞'])
# 先將詞頻這一列賦值為0 ，即定義這一列為int格式，后面再賦值
data['詞頻']=0
# 將excel的數(shù)據(jù)寫入data中
for i in range(0,len(df)):
    data.iloc[i,0]=df.iloc[i,1]
# 將詞頻按照從大到小排序
data = data['詞頻'].sort_values(ascending = False)
# 生成dict格式數(shù)據(jù)
data = dict(data)
print(data)

結(jié)果如下：

四、繪制詞云圖

由于excel文件本身已經(jīng)提供了關(guān)鍵詞以及對(duì)應(yīng)的詞頻，因此這里繪制詞云圖的時(shí)候不用對(duì)文本進(jìn)行結(jié)巴分詞。

1、繪制基本的詞云圖

詞云圖的相關(guān)代碼：

import matplotlib.pyplot as plt
from wordcloud import WordCloud

#關(guān)鍵詞有中文，因此需要設(shè)置顯示字體，否則會(huì)亂碼
font_path = "C:\Windows\Fonts\Microsoft YaHei UI\msyh.ttc"
# 設(shè)置詞云圖相關(guān)參數(shù)
wc=WordCloud(
             font_path=font_path,
             width=400,height=400,
             scale=2,mode="RGBA",
             background_color='white')
# 根據(jù)dict制作詞云圖
wc=wc.generate_from_frequencies(data)
#存儲(chǔ)詞云圖結(jié)果
wc.to_file('詞云圖1.png')

圖片展示的相關(guān)代碼

#顯示圖片
plt.imshow(wc,interpolation="bilinear")
plt.axis("off")# 不顯示圖像坐標(biāo)系
# 顯示圖像
plt.show()
plt.savefig("詞云圖2.png")

結(jié)果如下：
完整代碼

import pandas as pd
df=pd.read_excel("data-test.xlsx")# 讀取excel數(shù)據(jù)信息
print(df)

# 只獲取前5條數(shù)據(jù)
df_new=df.head(5)
print(df_new)

# 生成一個(gè)DataFrame文件，index為df數(shù)據(jù)的index
data = pd.DataFrame(index=df['關(guān)鍵詞'])
# 先將詞頻這一列賦值為0 ，即定義這一列為int格式，后面再賦值
data['詞頻']=0
# 將excel的數(shù)據(jù)寫入data中
for i in range(0,len(df)):
    data.iloc[i,0]=df.iloc[i,1]
# 將詞頻按照從大到小排序
data = data['詞頻'].sort_values(ascending = False)
# 生成dict格式數(shù)據(jù)
data = dict(data)
print(data)

# 生成詞云圖
import matplotlib.pyplot as plt
from wordcloud import WordCloud
#關(guān)鍵詞有中文，因此需要設(shè)置顯示字體，否則會(huì)亂碼
font_path = "C:\Windows\Fonts\Microsoft YaHei UI\msyh.ttc"
# 設(shè)置詞云圖的相關(guān)參數(shù)
wc=WordCloud(
             font_path=font_path,
             width=500,
             height=500,
             scale=2,
             mode="RGBA",
             background_color='white')

# 根據(jù)dict制作詞云圖
wc=wc.generate_from_frequencies(data)
#存儲(chǔ)詞云圖結(jié)果
#存儲(chǔ)圖像
wc.to_file('詞云圖1.png')
#顯示圖片
plt.imshow(wc,interpolation="bilinear")
# 不顯示坐標(biāo)系
plt.axis("off")
# 顯示圖像
plt.show()
# 保存結(jié)果
plt.savefig("詞云圖2.png")

2、繪制指定形狀的詞云圖

（1）準(zhǔn)備背景圖片

以下面的背景圖片為例：
(注：圖片的背景顏色要是白色的；而且不要有水印否則也會(huì)被當(dāng)做背景圖片的一部分！?。。?p style="text-align:center">

（2）處理背景圖片

需要將圖片轉(zhuǎn)化為數(shù)組，便于用作詞云圖形狀

# 生成詞云圖
import matplotlib.pyplot as plt
from wordcloud import WordCloud
import numpy as np # numpy數(shù)據(jù)處理庫(kù)
from PIL import Image # 圖像處理庫(kù)，用于讀取背景圖片
img = Image.open('圖片地址') # 加載背景圖片
img_array = np.array(img)    # 將圖片變?yōu)閿?shù)組，便于用作詞云圖形狀

將圖片數(shù)組化之后，結(jié)果如下：

（3）生成指定形狀的詞云圖

wc=WordCloud(mask=img_array,
             font_path=font_path,
             width=500,
             height=500,
             scale=2,
             contour_color='purple',contour_width=3,
             max_font_size=80,max_words=100,
             background_color='white')

結(jié)果如下：
完整代碼

import pandas as pd
df=pd.read_excel("data-test.xlsx")# 讀取excel數(shù)據(jù)信息
print(df)
print("====================================================")

# 只獲取前5條數(shù)據(jù)
# df_new=df.head(5)
# print(df_new)
print("====================================================")

# 生成一個(gè)DataFrame文件，index為df數(shù)據(jù)的index
data = pd.DataFrame(index=df['關(guān)鍵詞'])
# 先將詞頻這一列賦值為0 ，即定義這一列為int格式，后面再賦值
data['詞頻']=0
# 將excel的數(shù)據(jù)寫入data中
for i in range(0,len(df)):
    data.iloc[i,0]=df.iloc[i,1]
# 將詞頻按照從大到小排序
data = data['詞頻'].sort_values(ascending = False)
# 生成dict格式數(shù)據(jù)
# data = dict(data)
data = str(data)
print(data)
# print(type(data))

print("====================================================")

# 生成詞云圖
import matplotlib.pyplot as plt
from wordcloud import WordCloud
import numpy as np # numpy數(shù)據(jù)處理庫(kù)
from PIL import Image # 圖像處理庫(kù)，用于讀取背景圖片


img = Image.open('grape.jpg') # 加載背景圖片
img_array = np.array(img)    # 將圖片變?yōu)閿?shù)組，便于用作詞云圖形狀


#關(guān)鍵詞有中文，因此需要設(shè)置顯示字體，否則會(huì)亂碼
font_path = "C:\Windows\Fonts\Microsoft YaHei UI\msyh.ttc"
# 設(shè)置詞云圖的相關(guān)參數(shù)
# 設(shè)置詞云圖的相關(guān)參數(shù)
wc=WordCloud(mask=img_array,
             font_path=font_path,
             width=500,
             height=500,
             scale=2,
             contour_color='purple',contour_width=3,
             max_font_size=80,max_words=100,
             background_color='white')

# 根據(jù)dict制作詞云圖
wc=wc.generate(data)
# wc=wc.generate_from_frequencies(data)
#存儲(chǔ)詞云圖結(jié)果
#存儲(chǔ)圖像
wc.to_file('詞云圖1.png')
#顯示圖片
plt.imshow(wc,interpolation="bilinear")
# 不顯示坐標(biāo)系
plt.axis("off")
# 顯示圖像
plt.show()
# 保存結(jié)果
plt.savefig("詞云圖2.png")

五、待優(yōu)化

1、指定詞云圖形狀時(shí)，出現(xiàn)數(shù)據(jù)類型錯(cuò)誤的報(bào)錯(cuò)

一開始生成詞云圖的數(shù)據(jù)格式是字典格式，但是后面在指定形狀的時(shí)候，因?yàn)閳?bào)錯(cuò)就把數(shù)據(jù)格式轉(zhuǎn)換成字符串了，然后就能正常顯示：

# 生成dict格式數(shù)據(jù)
# data = dict(data)
data = str(data)

2、圖片輪廓的提取待改進(jìn)

在指定形狀的時(shí)候，對(duì)背景圖片的要求比較高，比如圖片的背景是白色的，圖片的輪換不光滑的話提取效果不好，因此在提取背景圖片的輪廓方面待改進(jìn)。

img = Image.open('grape.jpg') # 加載背景圖片
img_array = np.array(img)    # 將圖片變?yōu)閿?shù)組，便于用作詞云圖形狀

總結(jié)

到此這篇關(guān)于Python根據(jù)詞頻信息(xlsx、csv文件)繪制詞云圖(wordcloud)的文章就介紹到這了,更多相關(guān)Python根據(jù)詞頻繪制詞云圖內(nèi)容請(qǐng)搜索腳本之家以前的文章或繼續(xù)瀏覽下面的相關(guān)文章希望大家以后多多支持腳本之家！

您可能感興趣的文章:

欧美bbbwbbbw肥妇,免费乱码人妻系列日韩,一级黄片

Python根據(jù)詞頻信息(xlsx、csv文件)繪制詞云圖全過(guò)程(wordcloud)

目錄

一、前言

二、安裝并引入相關(guān)的庫(kù)

1、安裝相關(guān)的庫(kù)

2、導(dǎo)入相關(guān)的庫(kù)

三、數(shù)據(jù)處理

1、文件讀取

2、數(shù)據(jù)格式轉(zhuǎn)換

四、繪制詞云圖

1、繪制基本的詞云圖

2、繪制指定形狀的詞云圖

（1）準(zhǔn)備背景圖片

（2）處理背景圖片

（3）生成指定形狀的詞云圖

五、待優(yōu)化

1、指定詞云圖形狀時(shí)，出現(xiàn)數(shù)據(jù)類型錯(cuò)誤的報(bào)錯(cuò)

2、圖片輪廓的提取待改進(jìn)

總結(jié)

相關(guān)文章

最新評(píng)論

大家感興趣的內(nèi)容

最近更新的內(nèi)容

常用在線小工具

欧美bbbwbbbw肥妇,免费乱码人妻系列日韩,一级黄片

Python根據(jù)詞頻信息(xlsx、csv文件)繪制詞云圖全過(guò)程(wordcloud)

目錄

一、前言

二、安裝并引入相關(guān)的庫(kù)

1、安裝相關(guān)的庫(kù)

2、導(dǎo)入相關(guān)的庫(kù)

三、數(shù)據(jù)處理

1、文件讀取

2、數(shù)據(jù)格式轉(zhuǎn)換

四、繪制詞云圖

1、繪制基本的詞云圖

2、繪制指定形狀的詞云圖

（1）準(zhǔn)備背景圖片

（2）處理背景圖片

（3）生成指定形狀的詞云圖

五、待優(yōu)化

1、指定詞云圖形狀時(shí)，出現(xiàn)數(shù)據(jù)類型錯(cuò)誤的報(bào)錯(cuò)

2、圖片輪廓的提取待改進(jìn)

總結(jié)

相關(guān)文章

最新評(píng)論

大家感興趣的內(nèi)容

最近更新的內(nèi)容

常用在線小工具

二、安裝并引入相關(guān)的庫(kù)

2、導(dǎo)入相關(guān)的庫(kù)

三、數(shù)據(jù)處理

2、數(shù)據(jù)格式轉(zhuǎn)換

四、繪制詞云圖

1、繪制基本的詞云圖

五、待優(yōu)化

1、指定詞云圖形狀時(shí)，出現(xiàn)數(shù)據(jù)類型錯(cuò)誤的報(bào)錯(cuò)

2、圖片輪廓的提取待改進(jìn)