腳本之家服務(wù)器常用軟件

快捷導(dǎo)航

軟件下載

android MAC 驅(qū)動(dòng)下載字體下載 DLL

源碼下載

PHP ASP.NET ASP JSP

軟件編程

C# JAVA C 語(yǔ)言 Delphi Android

網(wǎng)絡(luò)編程

PHP ASP.NET ASP JavaScript

在線(xiàn)工具

CSS格式化 JS格式化 Html轉(zhuǎn)化為Js

數(shù)據(jù)庫(kù)

MYSQL MSSQL oracle DB2 MARIADB

CMS

PHPCMS DEDECMS 帝國(guó)CMS WordPress

常用工具

PHP開(kāi)發(fā)工具 python Photoshop 必備軟件

Python實(shí)現(xiàn)爬取某站視頻彈幕并繪制詞云圖

更新時(shí)間：2021年12月17日 12:01:12 作者：魔王不會(huì)哭

這篇文章主要介紹了利用Python爬取某站的視頻彈幕，并將其繪制成詞云圖，文中的示例代碼講解詳細(xì)，對(duì)我學(xué)習(xí)Python爬蟲(chóng)有一定的幫助，需要的朋友可以參考一下

前言

[課題]：

Python爬取某站視頻彈幕或者騰訊視頻彈幕，繪制詞云圖

[知識(shí)點(diǎn)]：

1. 爬蟲(chóng)基本流程

2. 正則

3. requests >>> pip install requests

4. jieba >>> pip install jieba

5. imageio >>> pip install imageio

6. wordcloud? >>> pip install wordcloud

[開(kāi)發(fā)環(huán)境]：

Python 3.8

Pycharm

win + R 輸入cmd 輸入安裝命令 pip install 模塊名如果出現(xiàn)爆紅可能是因?yàn)?網(wǎng)絡(luò)連接超時(shí) 切換國(guó)內(nèi)鏡像源

相對(duì)應(yīng)的安裝包/安裝教程/激活碼/使用教程/學(xué)習(xí)資料/工具插件可以找我

爬取彈幕

爬蟲(chóng)基本思路流程

一. 數(shù)據(jù)來(lái)源分析

1. 確定我們想要數(shù)據(jù)是什么?

爬取某站彈幕數(shù)據(jù) 保存文本txt

2. 通過(guò)開(kāi)發(fā)者工具進(jìn)行抓包分析...

通過(guò) 接口可以直接找到視頻的彈幕數(shù)據(jù)地址

二. 爬蟲(chóng)代碼實(shí)現(xiàn)步驟

1. 發(fā)送請(qǐng)求,? 對(duì)于（評(píng)論看）發(fā)送請(qǐng)求

需要注意點(diǎn):

- 請(qǐng)求方式確定
- 請(qǐng)求頭參數(shù)

2. 獲取數(shù)據(jù), 獲取服務(wù)器返回的數(shù)據(jù)

3. 解析數(shù)據(jù), 提取我們想要數(shù)據(jù)內(nèi)容, 彈幕數(shù)據(jù)

4. 保存數(shù)據(jù), 把獲取下來(lái)的數(shù)據(jù)內(nèi)容保存txt文本

模擬瀏覽器對(duì)于服務(wù)器發(fā)送請(qǐng)求

導(dǎo)入模塊

import requests  # 數(shù)據(jù)請(qǐng)求模塊 第三方模塊 pip install requests
import re  # 正則表達(dá)式模塊 內(nèi)置模塊 不需要安裝

代碼

# # 1. 發(fā)送請(qǐng)求
# url = '（評(píng)論看）'
# # headers 請(qǐng)求頭 作用把Python代碼進(jìn)行偽裝, 模擬成瀏覽器去發(fā)送請(qǐng)求
# # user-agent 瀏覽器基本身份標(biāo)識(shí)
# # headers 請(qǐng)求頭 字典數(shù)據(jù)類(lèi)型
# headers = {
#     'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.93 Safari/537.36'
# }
# # 通過(guò)requests模塊里面get請(qǐng)求方法, 對(duì)于url地址發(fā)送請(qǐng)求, 并且攜帶上headers請(qǐng)求頭, 最后用response變量去接收返回?cái)?shù)據(jù)
# response = requests.get(url=url, headers=headers)
# response.encoding = response.apparent_encoding
# # <Response [200]> response對(duì)象 200狀態(tài)碼 表示請(qǐng)求成功
# # 如果你想要獲取 網(wǎng)頁(yè)源代碼一樣的數(shù)據(jù)內(nèi)容的話(huà), 是獲取響應(yīng)體的文本數(shù)據(jù)
# # 如果服務(wù)器返回的數(shù)據(jù), 不是完整json數(shù)據(jù) 字典數(shù)據(jù) 直接獲取response.json()就會(huì)報(bào)錯(cuò)
# # 2. 獲取數(shù)據(jù) response.text 返回?cái)?shù)據(jù) html字符串?dāng)?shù)據(jù)
# # print(response.text)
# # 3. 解析數(shù)據(jù), 解析方式  re[可以直接對(duì)于字符串?dāng)?shù)據(jù)進(jìn)行提取] css xpath [主要根據(jù)標(biāo)簽屬性/節(jié)點(diǎn)提取數(shù)據(jù)]
# # () 精確匹配 表示想要的數(shù)據(jù) 泛匹配 .*? 正則表達(dá)式元字符 可以匹配任意字符(除了換行符\n以外)
# data_list = re.findall('<d p=".*?">(.*?)</d>', response.text)
# for index in data_list:
#     # mode 保存方式 encoding 編碼
#     # pprint.pprint() 格式化輸入 json字典數(shù)據(jù)
#     with open('彈幕.txt', mode='a', encoding='utf-8') as f:
#         f.write(index)
#         f.write('\n')
#         print(index)

url = 'https://mapi.vip.com/vips-mobile/rest/shopping/pc/search/product/rank?callback=getMerchandiseIds&app_name=shop_pc&app_version=4.0&warehouse=VIP_NH&fdc_area_id=104104101&client=pc&mobile_platform=1&province_id=104104&api_key=70f71280d5d547b2a7bb370a529aeea1&user_id=&mars_cid=1634797375792_17a23bdc351b36f2915c2f7ec16dc88e&wap_consumer=a&standby_id=nature&keyword=%E5%8F%A3%E7%BA%A2&lv3CatIds=&lv2CatIds=&lv1CatIds=&brandStoreSns=&props=&priceMin=&priceMax=&vipService=&sort=0&pageOffset=0&channelId=1&gPlatform=PC&batchSize=120&_=1639640088314'

headers = {
    'referer': 'https://category.vip.com/',
    'user-agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.93 Safari/537.36'
}
response = requests.get(url=url, headers=headers)
print(response.text)

制作詞云圖