快捷導(dǎo)航

Python爬蟲headers處理及網(wǎng)絡(luò)超時(shí)問(wèn)題解決方案

更新時(shí)間：2020年06月19日 09:31:28 作者：夏日的向日葵

這篇文章主要介紹了Python爬蟲headers處理及網(wǎng)絡(luò)超時(shí)問(wèn)題解決方案,文中通過(guò)示例代碼介紹的非常詳細(xì)，對(duì)大家的學(xué)習(xí)或者工作具有一定的參考學(xué)習(xí)價(jià)值,需要的朋友可以參考下

1、請(qǐng)求headers處理

　　我們有時(shí)請(qǐng)求服務(wù)器時(shí)，無(wú)論get或post請(qǐng)求，會(huì)出現(xiàn)403錯(cuò)誤，這是因?yàn)榉?wù)器拒絕了你的訪問(wèn)，這時(shí)我們可以通過(guò)模擬瀏覽器的頭部信息進(jìn)行訪問(wèn)，這樣就可以解決反爬設(shè)置的問(wèn)題。

import requests
# 創(chuàng)建需要爬取網(wǎng)頁(yè)的地址
url = 'https://www.baidu.com/'   
# 創(chuàng)建頭部信息
headers = {'User-Agent':'OW64; rv:59.0) Gecko/20100101 Firefox/59.0'}
# 發(fā)送網(wǎng)絡(luò)請(qǐng)求
response = requests.get(url, headers=headers)  
# 以字節(jié)流形式打印網(wǎng)頁(yè)源碼
print(response.content)

結(jié)果：

b'<!DOCTYPE html><!--STATUS OK-->\n\n\n  \n  \n              <html><head><meta http-equiv="Content-Type" content="text/html;charset=utf-8"><meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1"><meta content="always" name="referrer"><meta name="theme-color" content="#2932e1"><meta name="description" content="\xe5\x85\xa8\xe7\x90\x83\xe6\x9c\x80\xe5\xa4\xa7\xe7\x9a\x84\xe4\xb8\xad\xe6\x96\x87\xe6\x90\x9c\xe7\xb4\xa2\xe5\xbc\x95\xe6\x93\x8e\xe3\x80\x81\xe8\x87\xb4\xe5\x8a\x9b\xe4\xba\x8e\xe8\xae\xa9\xe7\xbd\x91\xe6\xb0\x91\xe6\x9b\xb4\xe4\xbe\xbf\xe6\x8d\xb7\xe5\x9c\xb0\xe8\x8e\xb7\xe5\x8f\x96\xe4\xbf\xa1\xe6\x81\xaf\xef\xbc\x8c\xe6\x89\xbe\xe5\x88\xb0\xe6\x89\x80\xe6\xb1\x82\xe3\x80\x82\xe7\x99\xbe\xe5\xba\xa6\xe8\xb6\x85\xe8\xbf\x87\xe5\x8d\x83\xe4\xba\xbf\xe7\x9a\x84\xe4\xb8\xad\xe6\x96\x87\xe7\xbd\x91\xe9\xa1\xb5\xe6\x95\xb0\xe6\x8d\xae\xe5\xba\x93\xef\xbc\x8c\xe5\x8f\xaf\xe4\xbb\xa5\xe7\x9e\xac\xe9\x97\xb4\xe6\x89\xbe\xe5\x88\xb0\xe7\x9b\xb8\xe5\x85\xb3\xe7\x9a\x84\xe6\x90\x9c\xe7\xb4\xa2\xe7\xbb\x93\xe6\x9e\x9c\xe3\x80\x82"><link rel="shortcut icon" href="/favicon.ico" rel="external nofollow" type="image/x-icon" /><link rel="search" type="application/opensearchdescription+xml" href="/content-search.xml" rel="external nofollow" title="\xe7\x99\xbe\xe5\xba\xa6\xe6\x90\x9c\xe7\xb4\xa2" /><link rel="icon" sizes="any" mask  rel="external nofollow" ><link rel="dns-prefetch"  rel="external nofollow" /><link rel="dns-prefetch"  rel="external nofollow" /><link rel="dns-prefetch"  rel="external nofollow" /><link rel="dns-prefetch"  rel="external nofollow" /><link rel="dns-prefetch"  rel="external nofollow" /><link rel="dns-prefetch"  rel="external nofollow" />

2、網(wǎng)絡(luò)超時(shí)問(wèn)題

　　在訪問(wèn)一個(gè)網(wǎng)頁(yè)時(shí)，如果該網(wǎng)頁(yè)長(zhǎng)時(shí)間未響應(yīng)，系統(tǒng)就會(huì)判斷該網(wǎng)頁(yè)超時(shí)，而無(wú)法打開網(wǎng)頁(yè)。下面通過(guò)代碼來(lái)模擬一個(gè)網(wǎng)絡(luò)超時(shí)的現(xiàn)象。

import requests
# 循環(huán)發(fā)送請(qǐng)求50次
for a in range(1, 50):
  # 捕獲異常
  try:
    # 設(shè)置超時(shí)為0.5秒
    response = requests.get('https://www.baidu.com/', timeout=0.5)
    # 打印狀態(tài)碼
    print(response.status_code)
  # 捕獲異常
  except Exception as e:
    # 打印異常信息
    print('異常'+str(e))

結(jié)果：

以上代碼中，模擬進(jìn)行了50次循環(huán)請(qǐng)求，設(shè)置超時(shí)時(shí)間為0.5秒，在0.5秒內(nèi)服務(wù)器未作出相應(yīng)視為超時(shí)，程序會(huì)將超時(shí)信息打印在控制臺(tái)中。

　　說(shuō)起網(wǎng)絡(luò)異常信息，requests模塊同樣提供了三種常見的網(wǎng)絡(luò)異常類，示例代碼如下：

import requests
# 導(dǎo)入requests.exceptions模塊中的三種異常類
from requests.exceptions import ReadTimeout,HTTPError,RequestException
# 循環(huán)發(fā)送請(qǐng)求50次
for a in range(1, 50):
  # 捕獲異常
  try:
    # 設(shè)置超時(shí)為0.5秒
    response = requests.get('https://www.baidu.com/', timeout=0.5)
    # 打印狀態(tài)碼
    print(response.status_code)
  # 超時(shí)異常
  except ReadTimeout:
    print('timeout')
  # HTTP異常
  except HTTPError:
    print('httperror')
  # 請(qǐng)求異常
  except RequestException:
    print('reqerror')

結(jié)果：

以上就是本文的全部?jī)?nèi)容，希望對(duì)大家的學(xué)習(xí)有所幫助，也希望大家多多支持腳本之家。

您可能感興趣的文章: