詳解Python requests 超時(shí)和重試的方法

更新時(shí)間：2018年12月18日 11:45:01 作者：丹楓無跡

這篇文章主要介紹了詳解Python requests 超時(shí)和重試的方法，小編覺得挺不錯(cuò)的，現(xiàn)在分享給大家，也給大家做個(gè)參考。一起跟隨小編過來看看吧

網(wǎng)絡(luò)請求不可避免會(huì)遇上請求超時(shí)的情況，在 requests 中，如果不設(shè)置你的程序可能會(huì)永遠(yuǎn)失去響應(yīng)。

超時(shí)又可分為連接超時(shí)和讀取超時(shí)。

連接超時(shí)

連接超時(shí)指的是在你的客戶端實(shí)現(xiàn)到遠(yuǎn)端機(jī)器端口的連接時(shí)（對應(yīng)的是 connect() ），Request 等待的秒數(shù)。

import time
import requests

url = 'http://www.google.com.hk'

print(time.strftime('%Y-%m-%d %H:%M:%S'))
try:
  html = requests.get(url, timeout=5).text
  print('success')
except requests.exceptions.RequestException as e:
  print(e)

print(time.strftime('%Y-%m-%d %H:%M:%S'))

因?yàn)?google 被墻了，所以無法連接，錯(cuò)誤信息顯示 connect timeout（連接超時(shí)）。

2018-12-14 14:38:20
HTTPConnectionPool(host='www.google.com.hk', port=80): Max retries exceeded with url: / (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x00000000047F80F0>, 'Connection to www.google.com.hk timed out. (connect timeout=5)'))
2018-12-14 14:38:25

就算不設(shè)置，也會(huì)有一個(gè)默認(rèn)的連接超時(shí)時(shí)間（我測試了下，大概是21秒）。

讀取超時(shí)

讀取超時(shí)指的就是客戶端等待服務(wù)器發(fā)送請求的時(shí)間。（特定地，它指的是客戶端要等待服務(wù)器發(fā)送字節(jié)之間的時(shí)間。在 99.9% 的情況下這指的是服務(wù)器發(fā)送第一個(gè)字節(jié)之前的時(shí)間）。

簡單的說，連接超時(shí)就是發(fā)起請求連接到連接建立之間的最大時(shí)長，讀取超時(shí)就是連接成功開始到服務(wù)器返回響應(yīng)之間等待的最大時(shí)長。

讀取超時(shí)是沒有默認(rèn)值的，如果不設(shè)置，程序?qū)⒁恢碧幱诘却隣顟B(tài)。我們的爬蟲經(jīng)?？ㄋ烙譀]有任何的報(bào)錯(cuò)信息，原因就在這里了。

如果你設(shè)置了一個(gè)單一的值作為 timeout，如下所示：

r = requests.get('https://github.com', timeout=5)

這一 timeout 值將會(huì)用作 connect 和 read 二者的 timeout。如果要分別制定，就傳入一個(gè)元組：

r = requests.get('https://github.com', timeout=(3.05, 27))

黑板課爬蟲闖關(guān)的第四關(guān)正好網(wǎng)站人為設(shè)置了一個(gè)15秒的響應(yīng)等待時(shí)間，拿來做說明最好不過了。

import time
import requests

url_login = 'http://www.heibanke.com/accounts/login/?next=/lesson/crawler_ex03/'

session = requests.Session()
session.get(url_login)

token = session.cookies['csrftoken']
session.post(url_login, data={'csrfmiddlewaretoken': token, 'username': 'guliang21', 'password': '123qwe'})

print(time.strftime('%Y-%m-%d %H:%M:%S'))

url_pw = 'http://www.heibanke.com/lesson/crawler_ex03/pw_list/'
try:
  html = session.get(url_pw, timeout=(5, 10)).text
  print('success')
except requests.exceptions.RequestException as e:
  print(e)

print(time.strftime('%Y-%m-%d %H:%M:%S'))

錯(cuò)誤信息中顯示的是 read timeout（讀取超時(shí)）。

2018-12-14 15:20:47
HTTPConnectionPool(host='www.heibanke.com', port=80): Read timed out. (read timeout=10)
2018-12-14 15:20:57

超時(shí)重試

一般超時(shí)我們不會(huì)立即返回，而會(huì)設(shè)置一個(gè)三次重連的機(jī)制。

def gethtml(url):
  i = 0
  while i < 3:
    try:
      html = requests.get(url, timeout=5).text
      return html
    except requests.exceptions.RequestException:
      i += 1

其實(shí) requests 已經(jīng)幫我們封裝好了。（但是代碼好像變多了…）

import time
import requests
from requests.adapters import HTTPAdapter

s = requests.Session()
s.mount('http://', HTTPAdapter(max_retries=3))
s.mount('https://', HTTPAdapter(max_retries=3))

print(time.strftime('%Y-%m-%d %H:%M:%S'))
try:
  r = s.get('http://www.google.com.hk', timeout=5)
  return r.text
except requests.exceptions.RequestException as e:
  print(e)
print(time.strftime('%Y-%m-%d %H:%M:%S'))

max_retries 為最大重試次數(shù)，重試3次，加上最初的一次請求，一共是4次，所以上述代碼運(yùn)行耗時(shí)是20秒而不是15秒

2018-12-14 15:34:03
HTTPConnectionPool(host='www.google.com.hk', port=80): Max retries exceeded with url: / (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x0000000013269630>, 'Connection to www.google.com.hk timed out. (connect timeout=5)'))
2018-12-14 15:34:23

以上就是本文的全部內(nèi)容，希望對大家的學(xué)習(xí)有所幫助，也希望大家多多支持腳本之家。

您可能感興趣的文章: