快捷導(dǎo)航

Python?Requests?基本使用及Requests與?urllib?區(qū)別

更新時(shí)間：2022年11月18日 11:13:33 作者：卡爾特斯

在使用Python爬蟲時(shí)，需要模擬發(fā)起網(wǎng)絡(luò)請求，主要用到的庫有requests庫和python內(nèi)置的urllib庫，一般建議使用requests，它是對urllib的再次封裝，今天通過本文給大家講解Python?Requests使用及urllib區(qū)別，感興趣的朋友一起看看吧

一、簡介

Python 內(nèi)置了 requests 模塊，該模塊主要用來發(fā)送 HTTP 請求，requests 模塊比 urllib 模塊更簡潔。

Requests 官方文檔、Requests 中文文檔

安裝

$ pip install requests

附：urllib 入門使用（步驟詳細(xì)）用于對比區(qū)別。

二、基本使用

# 導(dǎo)入
import requests

# 請求地址
url = "https://www.baidu.com"

# 獲取服務(wù)器響應(yīng)數(shù)據(jù)
response = requests.get(url=url)

# 1 個(gè)類型和 6 個(gè)屬性

# 1、響應(yīng)數(shù)據(jù)類型（urllib 的響應(yīng)數(shù)據(jù)類型是 http.client.HTTPResponse）
print(type(response)) # requests.models.Response

# 2、設(shè)置響應(yīng)的編碼格式
response.encoding = 'utf-8'

# 3、以字符串的形式返回網(wǎng)頁的源碼
print(response.text)

# 4、返回 url 地址
print(response.url)

# 5、返回二進(jìn)制的數(shù)據(jù)（text 的二進(jìn)制數(shù)據(jù)，urllib 的 response.read() 就是二進(jìn)制數(shù)據(jù)）
print(response.content)

# 6、返回響應(yīng)的狀態(tài)碼
print(response.status_code)

# 7、返回響應(yīng)頭信息
print(response.headers)

三、GET 請求與 urllib 區(qū)別

# 導(dǎo)入
import requests

# 請求地址
# url = "https://www.baidu.com/s?"
url = "https://www.baidu.com/s"

# 請求頭
headers = {
  'Accept': 'text/javascript, application/javascript, application/ecmascript, application/x-ecmascript, */*; q=0.01',
  # 'Accept-Encoding': 'gzip, deflate, br',
  'Accept-Language': 'zh-CN,zh;q=0.9',
  'Connection': 'keep-alive',
  'Cookie': '填自己的 Cookie，沒有會拿不到數(shù)據(jù)',
  'Host': 'www.baidu.com',
  'Referer': 'https://www.baidu.com/s?wd=%E5%8C%97%E4%BA%AC&rsv_spt=1&rsv_iqid=0xc4323e2d0000fc1a&issp=1&f=8&rsv_bp=1&rsv_idx=2&ie=utf-8&rqlang=cn&tn=baiduhome_pg&rsv_enter=0&rsv_dl=tb&oq=%25E5%258C%2597%25E4%25BA%25AC&rsv_btype=t&rsv_t=009fICC65EzpN%2BM16VRnKfYWv8Pm6F%2BO1r55ft99%2BL0OlRVHYYfi5cpRa1wOl%2Bhe0bQO&rsv_pq=f70437990000a294&prefixsug=%25E5%258C%2597%25E4%25BA%25AC&rsp=0',
  'sec-ch-ua': '"Google Chrome";v="107", "Chromium";v="107", "Not=A?Brand";v="24"',
  'sec-ch-ua-mobile': '?0',
  'sec-ch-ua-platform': '"macOS"',
  'Sec-Fetch-Dest': 'empty',
  'Sec-Fetch-Mode': 'cors',
  'Sec-Fetch-Site': 'same-origin',
  'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/107.0.0.0 Safari/537.36',
  'X-Requested-With': 'XMLHttpRequest'
}

# 參數(shù)
data = {
  'wd': '北京'
}

# 獲取服務(wù)器響應(yīng)數(shù)據(jù)【def get(url, params=None, **kwargs)】
# url: 請求地址
# params: 參數(shù)
# kwargs: 字典
response = requests.get(url=url, params=data, headers=headers)

# 設(shè)置響應(yīng)數(shù)據(jù)編碼格式
response.encoding = 'utf-8'

# 獲取頁面內(nèi)容
print(response.text)

# 總結(jié)：
# 1、參數(shù)使用 params 傳遞
# 2、無需像 urllib 使用 urllib.parse.urlencode() 編碼，可以直接傳入
# 3、不需要請求對象的定制
# 4、請求資源路徑中的 ？可以加也可以不加
  # url = "https://www.baidu.com/s?"
  # url = "https://www.baidu.com/s"

四、POST 請求與 urllib 區(qū)別

# 導(dǎo)入
import requests

# 請求地址
url = "https://fanyi.baidu.com/sug"

# 請求頭
headers = {
  'Accept': 'application/json, text/javascript, */*; q=0.01',
  'Accept-Encoding': 'gzip, deflate, br',
  'Accept-Language': 'zh-CN,zh;q=0.9',
  'Connection': 'keep-alive',
  'Content-Length': '21',
  'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8',
  'Cookie': '填自己的 Cookie，沒有會拿不到數(shù)據(jù)',
  'Host': 'fanyi.baidu.com',
  'Origin': 'https://fanyi.baidu.com',
  'Referer': 'https://fanyi.baidu.com/',
  'sec-ch-ua': '"Google Chrome";v="107", "Chromium";v="107", "Not=A?Brand";v="24"',
  'sec-ch-ua-mobile': '?0',
  'sec-ch-ua-platform': '"macOS"',
  'Sec-Fetch-Dest': 'empty',
  'Sec-Fetch-Mode': 'cors',
  'Sec-Fetch-Site': 'same-origin',
  'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/107.0.0.0 Safari/537.36',
  'X-Requested-With': 'XMLHttpRequest'
}

# 參數(shù)
data = {
  'kw': '眼睛'
}

# 獲取服務(wù)器響應(yīng)數(shù)據(jù)【def post(url, data=None, json=None, **kwargs)】
# url: 請求地址
# data: 參數(shù)
# kwargs: 字典
response = requests.post(url=url, data=data, headers=headers)

# 設(shè)置響應(yīng)數(shù)據(jù)編碼格式
response.encoding = 'utf-8'

# 獲取頁面內(nèi)容
print(response.text)

# 總結(jié)：
# 1、post 請求不需要編解碼（urllib.parse.urlencode(params).encode('utf-8')）
# 2、post 請求的參數(shù)是 data，get 請求參數(shù)是 params
# 3、不需要請求對象的定制

五、IP代理

# 導(dǎo)入
import requests

# 請求地址
# url = "https://www.baidu.com/s?"
# url = "https://www.baidu.com/s"
url = "http://www.baidu.com/s"

# 請求頭
headers = {
  'Accept': 'text/javascript, application/javascript, application/ecmascript, application/x-ecmascript, */*; q=0.01',
  # 'Accept-Encoding': 'gzip, deflate, br',
  'Accept-Language': 'zh-CN,zh;q=0.9',
  'Connection': 'keep-alive',
  'Cookie': '填自己的 Cookie，沒有會拿不到數(shù)據(jù)',
  'Host': 'www.baidu.com',
  'Referer': 'https://www.baidu.com/s?wd=%E5%8C%97%E4%BA%AC&rsv_spt=1&rsv_iqid=0xc4323e2d0000fc1a&issp=1&f=8&rsv_bp=1&rsv_idx=2&ie=utf-8&rqlang=cn&tn=baiduhome_pg&rsv_enter=0&rsv_dl=tb&oq=%25E5%258C%2597%25E4%25BA%25AC&rsv_btype=t&rsv_t=009fICC65EzpN%2BM16VRnKfYWv8Pm6F%2BO1r55ft99%2BL0OlRVHYYfi5cpRa1wOl%2Bhe0bQO&rsv_pq=f70437990000a294&prefixsug=%25E5%258C%2597%25E4%25BA%25AC&rsp=0',
  'sec-ch-ua': '"Google Chrome";v="107", "Chromium";v="107", "Not=A?Brand";v="24"',
  'sec-ch-ua-mobile': '?0',
  'sec-ch-ua-platform': '"macOS"',
  'Sec-Fetch-Dest': 'empty',
  'Sec-Fetch-Mode': 'cors',
  'Sec-Fetch-Site': 'same-origin',
  'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/107.0.0.0 Safari/537.36',
  'X-Requested-With': 'XMLHttpRequest'
}

# 參數(shù)
data = {
  'wd': 'ip'
}

# 代理
proxys = {
  # 前面的 http 協(xié)議最好跟 請求地址 的協(xié)議保持一致，有時(shí)候會出問題
  # 'http': '222.74.73.202:42055'
}

# 獲取服務(wù)器響應(yīng)數(shù)據(jù)【def get(url, params=None, **kwargs)】
# url: 請求地址
# params: 參數(shù)
# kwargs: 字典
response = requests.get(url=url, params=data, headers=headers, proxies=proxys)

# 設(shè)置響應(yīng)數(shù)據(jù)編碼格式
response.encoding = 'utf-8'

# 獲取頁面內(nèi)容
# print(response.text)

# 存儲到文件夾
with open('dali.html', 'w', encoding='utf-8') as f:
  f.write(response.text)

六、驗(yàn)證碼案例

通過古詩文網(wǎng) 作為測試

# 登錄接口需要的參數(shù)：
# __VIEWSTATE: ySbbXPOgH0tbN+MZqd0YtuJiFM8uIhBDD9pK/q4dqPvGLwWIbW799+Hr7aDPNHZpg27Nxe259UePM3z1Rc2X89uauZJQEkkrcyVULG09iqo38jAnG6zaq5D6a2/ZhOx7HIPakzBHk5K6JRQ2kGMtIfN0Qjs=
# __VIEWSTATEGENERATOR: C93BE1AE
# from: http://so.gushiwen.cn/user/collect.aspx
# email: xxxx
# pwd: xxxx
# code: TJ83
# denglu: 登錄

# 觀察到 __VIEWSTATE 與 __VIEWSTATEGENERATOR 不知道如何獲??？
# 一般情況下，看不到的數(shù)據(jù)，都是在頁面源碼中，弄一個(gè)不可見的元素賦值，所有可以進(jìn)入源碼中，搜索試試

# 導(dǎo)入
import requests
from lxml import etree

# 請求地址
url = 'https://so.gushiwen.cn/user/login.aspx?from=http%3a%2f%2fso.gushiwen.cn%2fuser%2fcollect.aspx'

# 請求頭
headers = {
  'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/107.0.0.0 Safari/537.36'
}

# 獲取頁面數(shù)據(jù)
response = requests.get(url=url, headers=headers)

# 然后可以在內(nèi)容中搜搜 __VIEWSTATE 與 __VIEWSTATEGENERATOR，以免做了反扒
# print(response.text)

# 設(shè)置響應(yīng)數(shù)據(jù)編碼
response.encoding = 'utf-8'

# 解析服務(wù)器響應(yīng)數(shù)據(jù)
tree = etree.HTML(response.text)

# 獲取 __VIEWSTATE 值
VIEWSTATE = tree.xpath('//input[@id="__VIEWSTATE"]/@value')[0]

# 獲取 __VIEWSTATEGENERATOR 值
VIEWSTATEGENERATOR = tree.xpath('//input[@id="__VIEWSTATEGENERATOR"]/@value')[0]

# 獲取驗(yàn)證碼
code = tree.xpath('//img[@id="imgCode"]/@src')[0]
code_url = 'https://so.gushiwen.cn' + code

# 將驗(yàn)證碼圖片下載到本地，但是不能使用 urllib.request.urlretrieve() 去下載，下載就會導(dǎo)致切換驗(yàn)證碼了，保存的也就變成舊的了
# 解決方法：requests 里面有個(gè) session()，能夠跨請求地保持某些參數(shù)，說白了，就是比如使用 session 成功的登錄了某個(gè)網(wǎng)站，則再次使用該 session 對象對該網(wǎng)站的其他網(wǎng)頁訪問時(shí)都會默認(rèn)使用該 session 之前使用的 cookie 等參數(shù)
session = requests.session()
# 獲取驗(yàn)證碼內(nèi)容
response_code = session.get(code_url)
# 注意此時(shí)要使用二進(jìn)制數(shù)據(jù)，因?yàn)橐螺d圖片
content_code = response_code.content
# 將二進(jìn)制數(shù)據(jù)寫入到文件(有時(shí)候會有延遲，反應(yīng)沒那么快，編輯器沒有出現(xiàn)，可以直接進(jìn)入文件看看)
with open('code.jpg', 'wb') as f:
  f.write(content_code)

# 識別驗(yàn)證碼的庫網(wǎng)上很多，搜搜

到此這篇關(guān)于Python Requests 基本使用（與 urllib 的區(qū)別）的文章就介紹到這了,更多相關(guān)Python Requests 使用內(nèi)容請搜索腳本之家以前的文章或繼續(xù)瀏覽下面的相關(guān)文章希望大家以后多多支持腳本之家！

您可能感興趣的文章:

相關(guān)文章

關(guān)于Django Models CharField 參數(shù)說明
這篇文章主要介紹了關(guān)于Django Models CharField 參數(shù)說明，具有很好的參考價(jià)值，希望對大家有所幫助。一起跟隨小編過來看看吧
2020-03-03
Pycharm以root權(quán)限運(yùn)行腳本的方法
今天小編就為大家分享一篇Pycharm以root權(quán)限運(yùn)行腳本的方法，具有很好的參考價(jià)值，希望對大家有所幫助。一起跟隨小編過來看看吧
2019-01-01
python連接mongodb操作數(shù)據(jù)示例(mongodb數(shù)據(jù)庫配置類)
這篇文章主要介紹了python連接mongodb操作數(shù)據(jù)示例,主要包括插入數(shù)據(jù)、更新數(shù)據(jù)、查詢數(shù)據(jù)、刪除數(shù)據(jù)等
2013-12-12
Python實(shí)現(xiàn)多線程的兩種方式分析
這篇文章主要介紹了Python實(shí)現(xiàn)多線程的兩種方式,結(jié)合實(shí)例形式分析了通過自定義函數(shù)傳遞Thread對象以及繼承Thread類兩種多線程實(shí)現(xiàn)方式相關(guān)操作技巧,需要的朋友可以參考下
2018-08-08
pandas 獲取季度,月度,年度首尾日期的方法
下面小編就為大家分享一篇pandas 獲取季度,月度,年度首尾日期的方法，具有很好的參考價(jià)值，希望對大家有所幫助。一起跟隨小編過來看看吧
2018-04-04
python爬蟲scrapy基于CrawlSpider類的全站數(shù)據(jù)爬取示例解析
這篇文章主要介紹了python爬蟲scrapy基于CrawlSpider類的全站數(shù)據(jù)爬取示例解析,本文給大家介紹的非常詳細(xì)，對大家的學(xué)習(xí)或工作具有一定的參考借鑒價(jià)值，需要的朋友可以參考下
2021-02-02
Python解釋器及PyCharm工具安裝過程
這篇文章主要介紹了Python解釋器和python 開發(fā)工具PyCharm安裝過程，本文通過圖文并茂的形式給大家介紹的非常詳細(xì)，具有一定的參考借鑒價(jià)值,需要的朋友可以參考下
2020-02-02
這篇文章主要介紹了python 實(shí)現(xiàn)圖片修復(fù)（可用于去水?。?，幫助大家更好的理解和使用opencv庫，感興趣的朋友可以了解下
2020-11-11