快捷導(dǎo)航

Python基于jieba分詞實(shí)現(xiàn)snownlp情感分析

更新時(shí)間：2023年01月30日 09:19:13 作者：Sir 老王

情感分析（sentiment analysis）是2018年公布的計(jì)算機(jī)科學(xué)技術(shù)名詞，它可以根據(jù)文本內(nèi)容判斷出所代表的含義是積極的還是負(fù)面的等。本文將通過(guò)jieba分詞實(shí)現(xiàn)snownlp情感分析，感興趣的可以了解一下

情感分析（sentiment analysis）是2018年公布的計(jì)算機(jī)科學(xué)技術(shù)名詞。

它可以根據(jù)文本內(nèi)容判斷出所代表的含義是積極的還是負(fù)面的，也可以用來(lái)分析文本中的意思是褒義還是貶義。

一般應(yīng)用場(chǎng)景就是能用來(lái)做電商的大量評(píng)論數(shù)據(jù)的分析，比如好評(píng)率或者差評(píng)率的統(tǒng)計(jì)等等。

我們這里使用到的情感分析的模塊是snownlp，為了提高情感分析的準(zhǔn)確度選擇加入了jieba模塊的分詞處理。

由于以上的兩個(gè)python模塊都是非標(biāo)準(zhǔn)庫(kù)，因此我們可以使用pip的方式進(jìn)行安裝。

pip?install?jieba

pip?install?snownlp

jieba是一個(gè)強(qiáng)大的中文分詞處理庫(kù)，能夠滿足大多數(shù)的中文分詞處理，協(xié)助snownlp的情感分析。

#?Importing?the?jieba?module?and?renaming?it?to?ja.
import?jieba?as?ja
from?snownlp?import?SnowNLP

#?Importing?the?snownlp?module?and?renaming?it?to?nlp.

為了避免大家使用過(guò)程中出現(xiàn)的版本沖突問(wèn)題，這里將python的內(nèi)核版本展示出來(lái)。

python解釋器版本：3.6.8

接下來(lái)首先創(chuàng)建一組需要進(jìn)行情感分的數(shù)據(jù)源，最后直接分析出該文本代表的是一個(gè)積極情緒還是消極情緒。

#?Creating?a?variable?called?analysis_text?and?assigning?it?the?value?of?a?string.
analysis_text?=?'這個(gè)實(shí)在是太好用了，我非常的喜歡，下次一定還會(huì)購(gòu)買的！'

定義好了需要分析的數(shù)據(jù)來(lái)源語(yǔ)句，然后就是分詞處理了。這里說(shuō)明一下為什么需要分詞處理，是因?yàn)閟nownlp這個(gè)情感分析模塊它的中文分詞結(jié)果不太標(biāo)準(zhǔn)。

比如說(shuō)，'不好看'，這個(gè)詞如果使用snownlp來(lái)直接分詞的話大概率的就會(huì)分為'不'和'好看'這兩個(gè)詞。

這樣的明明是一個(gè)帶有負(fù)面情緒的中文詞匯可能就直接被定義為正面情緒了，這也就是為什么這里需要先使用jieba進(jìn)行分詞處理了。

#?Using?the?jieba?module?to?cut?the?analysis_text?into?a?list?of?words.
analysis_list?=?list(ja.cut(analysis_text))

#?Printing?the?list?of?words?that?were?cut?from?the?analysis_text.
print(analysis_list)

#?['這個(gè)', '實(shí)在', '是', '太', '好', '用', '了', '，', '我', '非常', '的', '喜歡', '，', '下次', '一定', '還會(huì)', '購(gòu)買', '的', '！']

根據(jù)上面分詞以后的結(jié)果來(lái)看，分詞的粒度還是比較細(xì)致的，每個(gè)詞都是最多兩個(gè)字符串的長(zhǎng)度。

使用jieba提供的cut()函數(shù)，關(guān)鍵詞已經(jīng)分割完成了，接著就是提取主要的關(guān)鍵字。

一般情況下我們做情感分析都會(huì)提取形容詞類型的關(guān)鍵字，因?yàn)樾稳菰~能夠代表該文本所表現(xiàn)出來(lái)的情緒。

#?Importing?the?`posseg`?module?from?the?`jieba`?module?and?renaming?it?to?`seg`.
import?jieba.posseg?as?seg

#?This?is?a?list?comprehension?that?is?creating?a?list?of?tuples.?Each?tuple?contains?the?word?and?the?flag.
analysis_words?=?[(word.word,?word.flag)?for?word?in?seg.cut(analysis_text)]

#?Printing?the?list?of?tuples?that?were?created?in?the?list?comprehension.
print(analysis_words)

#?[('這個(gè)', 'r'), ('實(shí)在', 'v'), ('是', 'v'), ('太', 'd'), ('好用', 'v'), ('了', 'ul'), ('，', 'x'), ('我', 'r'), ('非常', 'd'), ('的', 'uj'), ('喜歡', 'v'), ('，', 'x'), ('下次', 't'), ('一定', 'd'), ('還', 'd'), ('會(huì)', 'v'), ('購(gòu)買', 'v'), ('的', 'uj'), ('！', 'x')]

根據(jù)上面的python推導(dǎo)式，將分詞以后的關(guān)鍵字和該關(guān)鍵自對(duì)應(yīng)的詞性提取出來(lái)。

下面是一份jieba模塊使用過(guò)程中對(duì)應(yīng)的詞性表，比如詞性標(biāo)記a代表的就是形容詞。

#?This?is?a?list?comprehension?that?is?creating?a?list?of?tuples.?Each?tuple?contains?the?word?and?the?flag.
keywords?=?[x?for?x?in?analysis_words?if?x[1]?in?['a',?'d',?'v']]

#?Printing?the?list?of?tuples?that?were?created?in?the?list?comprehension.
print(keywords)

#?[('實(shí)在',?'v'),?('是',?'v'),?('太',?'d'),?('好用',?'v'),?('非常',?'d'),?('喜歡',?'v'),?('一定',?'d'),?('還',?'d'),?('會(huì)',?'v'),?('購(gòu)買',?'v')]

根據(jù)關(guān)鍵詞的標(biāo)簽提取出關(guān)鍵字以后，這個(gè)時(shí)候可以將情感標(biāo)記去除只保留關(guān)鍵字就可以了。

#?This?is?a?list?comprehension?that?is?creating?a?list?of?words.
keywords?=?[x[0]?for?x?in?keywords]

#?Printing?the?list?of?keywords?that?were?created?in?the?list?comprehension.
print(keywords)

#?['實(shí)在',?'是',?'太',?'好用',?'非常',?'喜歡',?'一定',?'還',?'會(huì)',?'購(gòu)買']

到現(xiàn)在為至，分詞的工作已經(jīng)處理完了，接下來(lái)就是情感分析直接使用snownlp分析出結(jié)果。

#?Creating?a?variable?called?`pos_num`?and?assigning?it?the?value?of?0.
pos_num?=?0

#?Creating?a?variable?called?`neg_num`?and?assigning?it?the?value?of?0.
neg_num?=?0

#?This?is?a?for?loop?that?is?looping?through?each?word?in?the?list?of?keywords.
for?word?in?keywords:
????#?Creating?a?variable?called?`sl`?and?assigning?it?the?value?of?the?`SnowNLP`?function.
????sl?=?SnowNLP(word)
????#?This?is?an?if?statement?that?is?checking?to?see?if?the?sentiment?of?the?word?is?greater?than?0.5.
????if?sl.sentiments?>?0.5:
????????#?Adding?1?to?the?value?of?`pos_num`.
????????pos_num?=?pos_num?+?1
????else:
????????#?Adding?1?to?the?value?of?`neg_num`.
????????neg_num?=?neg_num?+?1
????#?This?is?printing?the?word?and?the?sentiment?of?the?word.
????print(word,?str(sl.sentiments))

下面就是對(duì)原始文本提取關(guān)鍵詞以后的每個(gè)詞的情感分析結(jié)果，0-1之間代表情緒越接近于1代表情緒表現(xiàn)的越是積極向上。

#?實(shí)在?0.3047790802524796
#?是?0.5262327818078083
#?太?0.34387502381406
#?好用?0.6558628208940429
#?非常?0.5262327818078083
#?喜歡?0.6994590939824207
#?一定?0.5262327818078083
#?還?0.5746682977321914
#?會(huì)?0.5539033457249072
#?購(gòu)買?0.6502590673575129

為了使得關(guān)鍵詞的分析結(jié)果更加的符合我們的想法也可以對(duì)負(fù)面和正面的關(guān)鍵詞進(jìn)行統(tǒng)計(jì)得到一個(gè)結(jié)果。

#?This?is?a?string?that?is?using?the?`format`?method?to?insert?the?value?of?`pos_num`?into?the?string.
print('正面情緒關(guān)鍵詞數(shù)量：{}'.format(pos_num))

#?This?is?a?string?that?is?using?the?`format`?method?to?insert?the?value?of?`neg_num`?into?the?string.
print('負(fù)面情緒關(guān)鍵詞數(shù)量：{}'.format(neg_num))

#?This?is?a?string?that?is?using?the?`format`?method?to?insert?the?value?of?`pos_num`?divided?by?the?value?of?`pos_num`
#?plus?the?value?of?`neg_num`?into?the?string.
print('正面情緒所占比例：{}'.format(pos_num/(pos_num?+?neg_num)))

#?正面情緒關(guān)鍵詞數(shù)量：8
#?負(fù)面情緒關(guān)鍵詞數(shù)量：2
#?正面情緒所占比例：0.8

以上就是Python基于jieba分詞實(shí)現(xiàn)snownlp情感分析的詳細(xì)內(nèi)容，更多關(guān)于Python snownlp情感分析的資料請(qǐng)關(guān)注腳本之家其它相關(guān)文章！

您可能感興趣的文章: