Python實(shí)戰(zhàn)之實(shí)現(xiàn)獲取動(dòng)態(tài)圖表
前言
利用Python實(shí)現(xiàn)獲取動(dòng)態(tài)圖表,廢話不多說~
讓我們愉快地開始吧~
開發(fā)工具
Python版本: 3.6.4
相關(guān)模塊:
re模塊;
requests模塊;
urllib模塊;
pandas模塊;
以及一些Python自帶的模塊。
環(huán)境搭建
安裝Python并添加到環(huán)境變量,pip安裝需要的相關(guān)模塊即可。
看一下B站2019年「數(shù)據(jù)可視化」版塊的情況,第一個(gè)視頻超2百萬的播放量,4萬+的彈幕
百度指數(shù)
獲取百度指數(shù),首先需要登陸你的百度賬號(hào)
以關(guān)鍵詞「王者榮耀」為例,時(shí)間自定義為2020-10-01~2020-10-10
通過開發(fā)者工具,我們就能看到曲線圖的數(shù)據(jù)接口
然而一看請(qǐng)求得到的結(jié)果,發(fā)現(xiàn)并沒有數(shù)據(jù),原因是這里使用了JS加密
找到解決方法,成功實(shí)現(xiàn)爬取,代碼實(shí)現(xiàn)
import time import json import execjs import datetime import requests from urllib.parse import urlencode def get_data(keywords, startDate, endDate, area): """ 獲取加密的參數(shù)數(shù)據(jù) """ # data_url = "http://index.baidu.com/api/SearchApi/index?area=0&word=[[%7B%22name%22:%22%E7%8E%8B%E8%80%85%E8%8D%A3%E8%80%80%22,%22wordType%22:1%7D]]&startDate=2020-10-01&endDate=2020-10-10" params = { 'word': json.dumps([[{'name': keyword, 'wordType': 1}] for keyword in keywords]), 'startDate': startDate, 'endDate': endDate, 'area': area } data_url = 'http://index.baidu.com/api/SearchApi/index?' + urlencode(params) # print(data_url) headers = { # 復(fù)制登錄后的cookie "Cookie": '你的cookie', "Referer": "http://index.baidu.com/v2/main/index.html", "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.90 Safari/537.36" } # 獲取data和uniqid res = requests.get(url=data_url, headers=headers).json() data = res["data"]["userIndexes"][0]["all"]["data"] uniqid = res["data"]["uniqid"] # 獲取js函數(shù)中的參數(shù)t = "ev-fxk9T8V1lwAL6,51348+.9270-%" t_url = "http://index.baidu.com/Interface/ptbk?uniqid={}".format(uniqid) rep = requests.get(url=t_url, headers=headers).json() t = rep["data"] return {"data": data, "t": t} def get_search_index(word, startDate, endDate, area): """ 獲取最終數(shù)據(jù) """ word = word startDate = startDate endDate = endDate # 調(diào)用get_data獲取data和uniqid res = get_data(word, startDate, endDate, area) e = res["data"] t = res["t"] # 讀取js文件 with open('parsing_data_function.js', encoding='utf-8') as f: js = f.read() # 通過compile命令轉(zhuǎn)成一個(gè)js對(duì)象 docjs = execjs.compile(js) # 調(diào)用function方法,得到指數(shù)數(shù)值 res = docjs.call('decrypt', t, e) # print(res) return res def get_date_list(begin_date, end_date): """ 獲取時(shí)間列表 """ dates = [] dt = datetime.datetime.strptime(begin_date, "%Y-%m-%d") date = begin_date[:] while date <= end_date: dates.append(date) dt += datetime.timedelta(days=1) date = dt.strftime("%Y-%m-%d") return dates def get_area(): areas = {"901": "山東", "902": "貴州", "903": "江西", "904": "重慶", "905": "內(nèi)蒙古", "906": "湖北", "907": "遼寧", "908": "湖南", "909": "福建", "910": "上海", "911": "北京", "912": "廣西", "913": "廣東", "914": "四川", "915": "云南", "916": "江蘇", "917": "浙江", "918": "青海", "919": "寧夏", "920": "河北", "921": "黑龍江", "922": "吉林", "923": "天津", "924": "陜西", "925": "甘肅", "926": "新疆", "927": "河南", "928": "安徽", "929": "山西", "930": "海南", "931": "臺(tái)灣", "932": "西藏", "933": "香港", "934": "澳門"} for value in areas.keys(): try: word = ['王者榮耀'] time.sleep(1) startDate = '2020-10-01' endDate = '2020-10-10' area = value res = get_search_index(word, startDate, endDate, area) result = res.split(',') dates = get_date_list(startDate, endDate) for num, date in zip(result, dates): print(areas[value], num, date) with open('area.csv', 'a+', encoding='utf-8') as f: f.write(areas[value] + ',' + str(num) + ',' + date + '\n') except: pass def get_word(): words = ['諸葛大力', '張偉', '胡一菲', '呂子喬', '陳美嘉', '趙海棠', '咖喱醬', '曾小賢', '秦羽墨'] for word in words: try: time.sleep(2) startDate = '2020-10-01' endDate = '2020-10-10' area = 0 res = get_search_index(word, startDate, endDate, area) result = res.split(',') dates = get_date_list(startDate, endDate) for num, date in zip(result, dates): print(word, num, date) with open('word.csv', 'a+', encoding='utf-8') as f: f.write(word + ',' + str(num) + ',' + date + '\n') except: pass get_area() get_word()
得到的CSV文件結(jié)果如下,有兩種形式的數(shù)據(jù)
一種是多個(gè)關(guān)鍵詞每日指數(shù)數(shù)據(jù),另一種是一個(gè)關(guān)鍵詞各省市每日指數(shù)數(shù)據(jù)
有了數(shù)據(jù)就可以用Python制作動(dòng)圖
import pandas as pd import bar_chart_race as bcr # 讀取數(shù)據(jù) # df = pd.read_csv('word.csv', encoding='utf-8', header=None, names=['name', 'number', 'day']) df = pd.read_csv('area.csv', encoding='utf-8', header=None, names=['name', 'number', 'day']) # 數(shù)據(jù)處理,數(shù)據(jù)透視表 df_result = pd.pivot_table(df, values='number', index=['day'], columns=['name'], fill_value=0) # 生成GIF # bcr.bar_chart_race(df_result, filename='word.gif', title='愛情公寓5演職人員熱度排行') bcr.bar_chart_race(df_result, filename='area.gif', title='國內(nèi)各省市王者榮耀熱度排行')
5行Python代碼,看看實(shí)現(xiàn)的效果
微博指數(shù)
百度搜索新浪的微博指數(shù),打開網(wǎng)站一看,發(fā)現(xiàn)網(wǎng)頁版無法使用
我們只需打開開發(fā)者工具,將你的瀏覽器模擬為手機(jī)端,刷新網(wǎng)頁即可
可以看到,微指數(shù)的界面出來了
添加關(guān)鍵詞,查看指數(shù)的數(shù)據(jù)接口
請(qǐng)求是Post方法,并且不需要登陸微博賬號(hào)
import re import time import json import requests import datetime # 請(qǐng)求頭信息 headers = """accept: application/json accept-encoding: gzip, deflate, br accept-language: zh-CN,zh;q=0.9 content-length: 50 content-type: application/x-www-form-urlencoded cookie: '你的cookie' origin: https://data.weibo.com referer: https://data.weibo.com/index/newindex?visit_type=trend&wid=1011224685661 sec-fetch-mode: cors sec-fetch-site: same-origin user-agent: Mozilla/5.0 (iPhone; CPU iPhone OS 11_0 like Mac OS X) AppleWebKit/604.1.38 (KHTML, like Gecko) Version/11.0 Mobile/15A372 Safari/604.1 x-requested-with: XMLHttpRequest""" # 將請(qǐng)求頭字符串轉(zhuǎn)化為字典 headers = dict([line.split(": ",1) for line in headers.split("\n")]) print(headers) # 數(shù)據(jù)接口 url = 'https://data.weibo.com/index/ajax/newindex/getchartdata' # 獲取時(shí)間列表 def get_date_list(begin_date, end_date): dates = [] dt = datetime.datetime.strptime(begin_date, "%Y-%m-%d") date = begin_date[:] while date <= end_date: dates.append(date) dt += datetime.timedelta(days=1) date = dt.strftime("%Y-%m-%d") return dates # 相關(guān)信息 names = ['湯唯', '朱亞文', '鄧家佳', '喬振宇', '王學(xué)圻', '張藝興', '俞灝明', '吳越', '梁冠華', '李昕亮', '蘇可', '孫驍驍', '趙韓櫻子', '孫耀琦', '魏巍'] # 獲取微指數(shù)數(shù)據(jù) for name in names: try: # 獲取關(guān)鍵詞ID url_id = 'https://data.weibo.com/index/ajax/newindex/searchword' data_id = { 'word': name } html_id = requests.post(url=url_id, data=data_id, headers=headers) pattern = re.compile(r'li wid=\\\"(.*?)\\\" word') id = pattern.findall(html_id.text)[0] # 接口參數(shù) data = { 'wid': id, 'dateGroup': '1month' } time.sleep(2) # 請(qǐng)求數(shù)據(jù) html = requests.post(url=url, data=data, headers=headers) result = json.loads(html.text) # 處理數(shù)據(jù) if result['data']: values = result['data'][0]['trend']['s'] startDate = '2019-01-01' endDate = '2020-01-01' dates = result['data'][0]['trend']['x'] # 保存數(shù)據(jù) for value, date in zip(values, dates): print(name, value, date) with open('weibo.csv', 'a+', encoding='utf-8') as f: f.write(name + ',' + str(value) + ',' + date + '\n') except: pass
獲取到的信息
也來生成一個(gè)動(dòng)態(tài)圖表
import pandas as pd import bar_chart_race as bcr # 讀取數(shù)據(jù) df = pd.read_csv('weibo.csv', encoding='utf-8', header=None, names=['name', 'number', 'day']) # 數(shù)據(jù)處理,數(shù)據(jù)透視表 df_result = pd.pivot_table(df, values='number', index=['day'], columns=['name'], fill_value=0) # print(df_result[:10]) # 生成GIF bcr.bar_chart_race(df_result[:10], filename='weibo.gif', title='大明風(fēng)華演職人員熱度排行')
結(jié)果展示
有喜歡可以嘗試動(dòng)手試試哦~?
以上就是Python實(shí)戰(zhàn)之實(shí)現(xiàn)獲取動(dòng)態(tài)圖表的詳細(xì)內(nèi)容,更多關(guān)于Python獲取動(dòng)態(tài)圖表的資料請(qǐng)關(guān)注腳本之家其它相關(guān)文章!
相關(guān)文章
從零開始學(xué)習(xí)Python與BeautifulSoup網(wǎng)頁數(shù)據(jù)抓取
想要從零開始學(xué)習(xí)Python和BeautifulSoup網(wǎng)頁數(shù)據(jù)抓???本指南將為你提供簡單易懂的指導(dǎo),讓你掌握這兩個(gè)強(qiáng)大的工具,不管你是初學(xué)者還是有經(jīng)驗(yàn)的開發(fā)者,本指南都能幫助你快速入門并提升技能,不要錯(cuò)過這個(gè)機(jī)會(huì),開始你的編程之旅吧!2024-01-01python實(shí)現(xiàn)word文檔批量轉(zhuǎn)成自定義格式的excel文檔的思路及實(shí)例代碼
這篇文章主要介紹了python實(shí)現(xiàn)word文檔批量轉(zhuǎn)成自定義格式的excel文檔的解決思路及實(shí)例代碼,代碼簡單易懂,非常不錯(cuò),具有一定的參考借鑒價(jià)值,需要的朋友可以參考下2020-02-02Python中線程threading.Thread的使用詳解
python的thread模塊是比較底層的模塊,python的threading模塊是對(duì)thread做了一些包裝的,可以更加方便的被使用。本文將為大家詳細(xì)介紹一下python中的線程threading.Thread()的使用,需要的可以參考一下2022-07-07python實(shí)現(xiàn)Nao機(jī)器人的單目測距
這篇文章主要為大家詳細(xì)介紹了python實(shí)現(xiàn)Nao機(jī)器人的單目測距,文中示例代碼介紹的非常詳細(xì),具有一定的參考價(jià)值,感興趣的小伙伴們可以參考一下2021-09-09Python使用pyautogui模塊實(shí)現(xiàn)自動(dòng)化鼠標(biāo)和鍵盤操作示例
這篇文章主要介紹了Python使用pyautogui模塊實(shí)現(xiàn)自動(dòng)化鼠標(biāo)和鍵盤操作,簡單描述了pyautogui模塊的功能,并結(jié)合實(shí)例形式較為詳細(xì)的分析了Python使用pyautogui模塊實(shí)現(xiàn)鼠標(biāo)與鍵盤自動(dòng)化操作相關(guān)技巧,需要的朋友可以參考下2018-09-09python如何通過pyqt5實(shí)現(xiàn)進(jìn)度條
這篇文章主要介紹了python如何通過pyqt5實(shí)現(xiàn)進(jìn)度條,文中通過示例代碼介紹的非常詳細(xì),對(duì)大家的學(xué)習(xí)或者工作具有一定的參考學(xué)習(xí)價(jià)值,需要的朋友可以參考下2020-01-01