快捷導(dǎo)航

Python網(wǎng)絡(luò)請(qǐng)求模塊urllib與requests使用介紹

更新時(shí)間：2022年10月11日 09:20:03 作者：Python熱愛者

網(wǎng)絡(luò)爬蟲的第一步就是根據(jù)URL，獲取網(wǎng)頁的HTML信息。在Python3中，可以使用urllib和requests進(jìn)行網(wǎng)頁數(shù)據(jù)獲取，這篇文章主要介紹了Python網(wǎng)絡(luò)請(qǐng)求模塊urllib與requests使用

urlib 介紹

urllib.request 提供了一個(gè) urlopen 函數(shù)，來實(shí)現(xiàn)獲取頁面。支持不同的協(xié)議、基本驗(yàn)證、cookie、代理等特性。
urllib 有兩個(gè)版本 urllib 以及 urllib2。
urllib2 能夠接受 Request 對(duì)象，urllib 則只能接受 url。
urllib 提供了 urlencode 函數(shù)來對(duì)GET請(qǐng)求的參數(shù)進(jìn)行轉(zhuǎn)碼，urllib2 沒有對(duì)應(yīng)函數(shù)。
urllib 拋出了一個(gè) URLError 和一個(gè) HTTPError 來處理客戶端和服務(wù)端的異常情況。

Requests 介紹

Requests 是一個(gè)簡(jiǎn)單易用的，用Python編寫的HTTP庫。這個(gè)庫讓我們能夠用簡(jiǎn)單的參數(shù)就完成HTTP請(qǐng)求，而不必像 urllib 一樣自己指定參數(shù)。同時(shí)能夠自動(dòng)將響應(yīng)轉(zhuǎn)碼為Unicode，而且具有豐富的錯(cuò)誤處理功能。

International Domains and URLs
Keep-Alive & Connection Pooling
Sessions with Cookie Persistence
Browser-style SSL Verification
Basic/Digest Authentication
Elegant Key/Value Cookies
Automatic Decompression
Unicode Response Bodies
Multipart File Uploads
Connection Timeouts
.netrc support
List item
Python 2.6—3.4
Thread-safe

以下為一些示例代碼，本文環(huán)境為 Python 3.6

無需參數(shù)直接請(qǐng)求單個(gè)頁面

import urllib
from urllib.request import request
from urllib.urlopen import urlopen
# import urllib2
import requests
# 使用 urllib 方式獲取
response = urllib.request.urlopen('http://www.baidu.com')
# read() 讀取的是服務(wù)器的原始返回?cái)?shù)據(jù) decode() 后會(huì)進(jìn)行轉(zhuǎn)碼
print(response.read().decode())
# 使用 requests 方式獲取
# request 模塊相比
resp = requests.get('http://www.baidu.com')
print(resp)
print(resp.text)

HTTP 是基于請(qǐng)求和響應(yīng)的工作模式，urllib.request 提供了一個(gè) Request 對(duì)象來代表請(qǐng)求，因此上面的代碼也可以這么寫

req = urllib.request.Request('http://www.baidu.com')
with urllib.request.urlopen(req) as response:
print(response.read())

Request對(duì)象可以增加header信息

req = urllib.request.Request('http://www.baidu.com')
req.add_header('User-Agent', 'Mozilla/6.0 (iPhone; CPU iPhone OS 8_0 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/8.0 Mobile/10A5376e Safari/8536.25')
with urllib.request.urlopen(req) as response:
print(response.read())

或者直接將 header 傳入 Request 構(gòu)建函數(shù)。

帶參數(shù)的 GET 請(qǐng)求

帶有參數(shù)的請(qǐng)求和上面的例子本質(zhì)一樣，可以事先拼出URL請(qǐng)求字符串，然后再進(jìn)行請(qǐng)求。

本例使用了騰訊的股票API，可以傳入不同的股票代碼以及日期，查詢對(duì)應(yīng)股票在對(duì)應(yīng)時(shí)間的價(jià)格、交易信息。

# 使用帶參數(shù)的接口訪問
tencent_api = "http://qt.gtimg.cn/q=sh601939"
response = urllib.request.urlopen(tencent_api)
# read() 讀取的是服務(wù)器的原始返回?cái)?shù)據(jù) decode() 后會(huì)進(jìn)行轉(zhuǎn)碼
print(response.read())
resp = requests.get(tencent_api)
print(resp)
print(resp.text)

發(fā)送 POST 請(qǐng)求

urllib 沒有單獨(dú)區(qū)分 GET 和 POST 請(qǐng)求的函數(shù)，只是通過 Request 對(duì)象是否有 data 參數(shù)傳入來判斷。

import urllib.parse
import urllib.request
url = 'http://www.someserver.com/cgi-bin/register.cgi'
values = {'name' : 'Michael Foord',
          'location' : 'Northampton',
          'language' : 'Python' }
data = urllib.parse.urlencode(values)
data = data.encode('ascii') # data should be bytes req = urllib.request.Request(url, data)
with urllib.request.urlopen(req) as response:
   the_page = response.read()

到此這篇關(guān)于Python網(wǎng)絡(luò)請(qǐng)求模塊urllib與requests使用介紹的文章就介紹到這了,更多相關(guān)Python urllib與requests內(nèi)容請(qǐng)搜索腳本之家以前的文章或繼續(xù)瀏覽下面的相關(guān)文章希望大家以后多多支持腳本之家！

您可能感興趣的文章: