python爬取B站關(guān)注列表及數(shù)據(jù)庫(kù)的設(shè)計(jì)與操作
一、數(shù)據(jù)庫(kù)的設(shè)計(jì)與操作
1、數(shù)據(jù)的分析
B站的關(guān)注列表在
https://api.bilibili.com/x/relation/followings?vmid=UID&pn=1&ps=50&order=desc&order_type=attention
中,一頁(yè)最多50條信息。
我們大致分析一下信息,
{ "code": 0, "message": "0", "ttl": 1, "data": { "list": [{……
首先,列表內(nèi)容存在data:list里。
其次,對(duì)于列表中每一項(xiàng),有如下信息
"mid": 672353429, "attribute": 2, "mtime": 1630510107, "tag": null, "special": 0, "contract_info": { "is_contractor": false, "ts": 0, "is_contract": false, "user_attr": 0 }, "uname": "貝拉kira", "face": "http://i2.hdslb.com/bfs/face/668af440f8a8065743d3fa79cfa8f017905d0065.jpg", "sign": "元?dú)鉂M(mǎn)滿(mǎn)的A-SOUL舞擔(dān)參上~目標(biāo)TOP IDOL,一起加油!", "official_verify": { "type": 0, "desc": "虛擬偶像團(tuán)體A-SOUL 所屬藝人" }, "vip": { "vipType": 2, "vipDueDate": 1674576000000, "dueRemark": "", "accessStatus": 0, "vipStatus": 1, "vipStatusWarn": "", "themeType": 0, "label": { "path": "", "text": "年度大會(huì)員", "label_theme": "annual_vip", "text_color": "#FFFFFF", "bg_style": 1, "bg_color": "#FB7299", "border_color": "" }, "avatar_subscript": 1, "nickname_color": "#FB7299", "avatar_subscript_url": "http://i0.hdslb.com/bfs/vip/icon_Certification_big_member_22_3x.png" }
其中,mid為用戶(hù)獨(dú)一無(wú)二的UID,vipType,0是什么都沒(méi)開(kāi),1是大會(huì)員,2是年度大會(huì)員,official_verify中,type 0代表官方認(rèn)證,-1代表沒(méi)有官方認(rèn)證。
同時(shí)我們發(fā)現(xiàn),如果對(duì)方鎖了列表,會(huì)返回
{"code":-400,"message":"請(qǐng)求錯(cuò)誤","ttl":1}
2、數(shù)據(jù)庫(kù)設(shè)計(jì)
基于這些,我們先設(shè)計(jì)數(shù)據(jù)庫(kù),包含兩張表,用戶(hù)信息的基本屬性表和關(guān)注的關(guān)系表。
def createDB(): link=sqlite3.connect('BiliFollowDB.db') print("database open success") UserTableDDL=''' create table if not exists user( UID int PRIMARY KEY NOT NULL, NAME varchar NOT NULL, SIGN varchar DEFAULT NULL, vipType int NOT NULL, verifyType int NOT NULL, verifyDesc varchar DEFAULT NULL) ''' RelationTableDDL=''' create table if not exists relation( follower int NOT NULL, following int NOT NULL, followTime int NOT NULL, PRIMARY KEY (follower,following), FOREIGN KEY(follower,following) REFERENCES user(UID,UID) ) ''' # create user table link.execute(UserTableDDL) # create relation table link.execute(RelationTableDDL) print("database create success") link.commit() link.close()
3、數(shù)據(jù)庫(kù)操作
其次是插入新用戶(hù)的列表,我的思路是爬完一個(gè)人的關(guān)注列表,把一整個(gè)list丟給該函數(shù),判斷是否存在新增用戶(hù),存在則把新增用戶(hù)傳回,作為下一次爬蟲(chóng)的起點(diǎn)。
def insertUser(infos): conn=sqlite3.connect('BiliFollowDB.db') link=conn.cursor() InsertCmd="insert into user (UID,NAME,vipType,verifyType,sign,verifyDesc) values (?,?,?,?,?,?);" ExistCmd="select count(UID) from user where UID='%d';"# % UID newID=[] for info in infos: answer=link.execute(ExistCmd%info['uid']) for row in answer: exist_ID=row[0] if exist_ID==0: newID.append(info['uid']) link.execute(InsertCmd,(info['uid'],info['name'],info['vipType'],info['verifyType'],info['sign'],info['verifyDesc'])) conn.commit() conn.close() return newID
然后是插入關(guān)系的函數(shù),這個(gè)比較簡(jiǎn)單
def insertFollowing(uid:int,subscribe): conn=sqlite3.connect('BiliFollowDB.db') link=conn.cursor() InsertCmd="insert into relation (follower,following,followTime) values (?,?,?);" for follow in subscribe: link.execute(InsertCmd,(uid,follow[0],follow[1])) conn.commit() conn.close()
二、爬蟲(chóng)
通過(guò)觀察,我們發(fā)現(xiàn)睿叔叔鎖了5頁(yè)的關(guān)注列表
即使是人工操作也只能訪問(wèn)5頁(yè),那沒(méi)辦法啦,我們就爬5頁(yè)吧。
def getFollowingList(uid:int): url="https://api.bilibili.com/x/relation/followings?vmid=%d&pn=%d&ps=50&order=desc&order_type=attention&jsonp=jsonp"# % (UID, Page Number) infos=[] subscribe=[] for i in range(1,6): html=requests.get(url%(uid,i)) if html.status_code!=200: print("GET ERROR!") text=html.text dic=json.loads(text) if dic['code']==-400: break list=dic['data']['list'] for usr in list: info={} info['uid']=usr['mid'] info['name']=usr['uname'] info['vipType']=usr['vip']['vipType'] info['verifyType']=usr['official_verify']['type'] info['sign']=usr['sign'] if info['verifyType']==-1: info['verifyDesc']='NULL' else : info['verifyDesc']=usr['official_verify']['desc'] subscribe.append((usr['mid'],usr['mtime'])) infos.append(info) newID=insertUser(infos) insertFollowing(uid,subscribe) return newID
三、完整代碼
#by concyclics # -*- coding:UTF-8 -*- import sqlite3 import json import requests def createDB(): link=sqlite3.connect('BiliFollowDB.db') print("database open success") UserTableDDL=''' create table if not exists user( UID int PRIMARY KEY NOT NULL, NAME varchar NOT NULL, SIGN varchar DEFAULT NULL, vipType int NOT NULL, verifyType int NOT NULL, verifyDesc varchar DEFAULT NULL) ''' RelationTableDDL=''' create table if not exists relation( follower int NOT NULL, following int NOT NULL, followTime int NOT NULL, PRIMARY KEY (follower,following), FOREIGN KEY(follower,following) REFERENCES user(UID,UID) ) ''' # create user table link.execute(UserTableDDL) # create relation table link.execute(RelationTableDDL) print("database create success") link.commit() link.close() def insertUser(infos): conn=sqlite3.connect('BiliFollowDB.db') link=conn.cursor() InsertCmd="insert into user (UID,NAME,vipType,verifyType,sign,verifyDesc) values (?,?,?,?,?,?);" ExistCmd="select count(UID) from user where UID='%d';"# % UID newID=[] for info in infos: answer=link.execute(ExistCmd%info['uid']) for row in answer: exist_ID=row[0] if exist_ID==0: newID.append(info['uid']) link.execute(InsertCmd,(info['uid'],info['name'],info['vipType'],info['verifyType'],info['sign'],info['verifyDesc'])) conn.commit() conn.close() return newID def insertFollowing(uid:int,subscribe): conn=sqlite3.connect('BiliFollowDB.db') link=conn.cursor() InsertCmd="insert into relation (follower,following,followTime) values (?,?,?);" for follow in subscribe: try: link.execute(InsertCmd,(uid,follow[0],follow[1])) except: print((uid,follow[0],follow[1])) conn.commit() conn.close() def getFollowingList(uid:int): url="https://api.bilibili.com/x/relation/followings?vmid=%d&pn=%d&ps=50&order=desc&order_type=attention&jsonp=jsonp"# % (UID, Page Number) infos=[] subscribe=[] for i in range(1,6): html=requests.get(url%(uid,i)) if html.status_code!=200: print("GET ERROR!") return [] text=html.text dic=json.loads(text) if dic['code']==-400: return [] try: list=dic['data']['list'] except: return [] for usr in list: info={} info['uid']=usr['mid'] info['name']=usr['uname'] info['vipType']=usr['vip']['vipType'] info['verifyType']=usr['official_verify']['type'] info['sign']=usr['sign'] if info['verifyType']==-1: info['verifyDesc']='NULL' else : info['verifyDesc']=usr['official_verify']['desc'] subscribe.append((usr['mid'],usr['mtime'])) infos.append(info) newID=insertUser(infos) insertFollowing(uid,subscribe) return newID def getFollowingUid(uid:int): url="https://api.bilibili.com/x/relation/followings?vmid=%d&pn=%d&ps=50&order=desc&order_type=attention&jsonp=jsonp"# % (UID, Page Number) for i in range(1,6): html=requests.get(url%(uid,i)) if html.status_code!=200: print("GET ERROR!") return [] text=html.text dic=json.loads(text) if dic['code']==-400: return [] try: list=dic['data']['list'] except: return [] IDs=[] for usr in list: IDs.append(usr['mid']) return IDs def work(root): IDlist=root tmplist=[] while len(IDlist)!=0: tmplist=[] for ID in IDlist: print(ID) tmplist+=getFollowingList(ID) IDlist=tmplist def rework(): conn=sqlite3.connect('BiliFollowDB.db') link=conn.cursor() SelectCmd="select uid from user;" answer=link.execute(SelectCmd) IDs=[] for row in answer: IDs.append(row[0]) conn.commit() conn.close() newID=[] print(IDs) for ID in IDs: ids=getFollowingUid(ID) for id in ids: if id not in IDs: newID.append(id) return newID if __name__=="__main__": createDB() #work([**put root UID here**,])
四、項(xiàng)目倉(cāng)庫(kù)
https://github.com/Concyclics/BiliBiliFollowSpider
以上就是python爬取B站關(guān)注列表及數(shù)據(jù)庫(kù)的設(shè)計(jì)與操作的詳細(xì)內(nèi)容,更多關(guān)于python爬取B站關(guān)注列表的資料請(qǐng)關(guān)注腳本之家其它相關(guān)文章!
- Python爬蟲(chóng)實(shí)現(xiàn)爬取下載網(wǎng)站數(shù)據(jù)的幾種方法示例
- Python協(xié)程異步爬取數(shù)據(jù)(asyncio+aiohttp)實(shí)例
- python爬取數(shù)據(jù)中的headers和代理IP問(wèn)題分析
- python使用aiohttp通過(guò)設(shè)置代理爬取基金數(shù)據(jù)簡(jiǎn)單示例
- Python實(shí)戰(zhàn)使用Selenium爬取網(wǎng)頁(yè)數(shù)據(jù)
- Python?Haul利器簡(jiǎn)化數(shù)據(jù)爬取任務(wù)提高開(kāi)發(fā)效率
相關(guān)文章
python整小時(shí) 整天時(shí)間戳獲取算法示例
今天小編就為大家分享一篇python整小時(shí) 整天時(shí)間戳獲取算法示例,具有很好的參考價(jià)值,希望對(duì)大家有所幫助。一起跟隨小編過(guò)來(lái)看看吧2019-02-02python實(shí)現(xiàn)在函數(shù)中修改變量值的方法
今天小編就為大家分享一篇python實(shí)現(xiàn)在函數(shù)中修改變量值的方法,具有很好的參考價(jià)值,希望對(duì)大家有所幫助。一起跟隨小編過(guò)來(lái)看看吧2019-07-07python實(shí)現(xiàn)網(wǎng)站微信登錄的示例代碼
這篇文章主要介紹了python實(shí)現(xiàn)網(wǎng)站微信登錄的示例代碼,文中通過(guò)示例代碼介紹的非常詳細(xì),對(duì)大家的學(xué)習(xí)或者工作具有一定的參考學(xué)習(xí)價(jià)值,需要的朋友們下面隨著小編來(lái)一起學(xué)習(xí)學(xué)習(xí)吧2019-09-09Python設(shè)計(jì)模式之橋接模式原理與用法實(shí)例分析
這篇文章主要介紹了Python設(shè)計(jì)模式之橋接模式原理與用法,結(jié)合具體實(shí)例形式分析了Python橋接模式的相關(guān)概念、原理、定義及使用方法,需要的朋友可以參考下2019-01-01利用Python讀取Excel表內(nèi)容的詳細(xì)過(guò)程
python有多種方式可以去讀取excel文檔的內(nèi)容,下面這篇文章主要給大家介紹了利用Python讀取Excel表內(nèi)容的詳細(xì)過(guò)程,文中通過(guò)實(shí)例代碼介紹的非常詳細(xì),需要的朋友可以參考下2022-10-10python微信公眾號(hào)之關(guān)鍵詞自動(dòng)回復(fù)
這篇文章主要為大家詳細(xì)介紹了python微信公眾號(hào)之關(guān)鍵詞自動(dòng)回復(fù),具有一定的參考價(jià)值,感興趣的小伙伴們可以參考一下2018-06-06Python ldap實(shí)現(xiàn)登錄實(shí)例代碼
今天給大家分享python idap實(shí)現(xiàn)登錄的實(shí)例代碼,代碼簡(jiǎn)單易懂,需要的朋友一起看看吧2016-09-09Python Flask 請(qǐng)求數(shù)據(jù)獲取響應(yīng)詳解
這篇文章主要介紹了Python Flask請(qǐng)求數(shù)據(jù)獲取響應(yīng)的實(shí)現(xiàn)方法,具有很好的參考價(jià)值,希望對(duì)大家有所幫助。一起跟隨小編過(guò)來(lái)看看吧2021-10-10