快捷導(dǎo)航

Python爬蟲基本庫request的基本使用

更新時(shí)間：2023年07月07日 10:18:32 作者：milk-request

這篇文章主要介紹了Python爬蟲基本庫request的基本使用,urllib庫使用繁瑣，比如處理網(wǎng)頁驗(yàn)證和Cookies時(shí)，需要編寫Opener和Handler來處理。為了更加方便的實(shí)現(xiàn)這些操作，就有了更為強(qiáng)大的requests庫,需要的朋友可以參考下

request

用urllib去處理網(wǎng)頁驗(yàn)證和Cookies時(shí)，需要寫Opener和Handler來處理，很不方便，這里我們學(xué)習(xí)更為強(qiáng)大的庫request

get()

實(shí)例：

import requests #導(dǎo)入requests
html = requests.get('https://www.csdn.net/')#使用get方法獲取頁面信息
print(html.text)#調(diào)取text屬性查看頁面代碼

添加參數(shù)使用param+字典

import requests  # 導(dǎo)入requests
data = {
    'jl': '765',
    'kw': 'python',
    'kt': '3'
}
html = requests.get('https://sou.zhaopin.com/',params=data)  # 添加參數(shù)
print(html.text)  # 調(diào)取text屬性查看頁面代碼

添加headers使用headers+字典

import requests  # 導(dǎo)入requests
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.149 Safari/537.36'
}
data = {
    'jl': '765',
    'kw': 'python',
    'kt': '3'
}
html = requests.get('https://sou.zhaopin.com/',headers=headers,params=data)  # 添加參數(shù)
print(html.text)  # 調(diào)取text屬性查看頁面代碼

高級(jí)用法

cookies設(shè)置，代理設(shè)置等

Cookies

獲取cookies:

import requests  # 導(dǎo)入requests
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.149 Safari/537.36'
}
data = {
    'jl': '765',
    'kw': 'python',
    'kt': '3'
}
html = requests.get('https://blog.csdn.net/qq_40966461/article/details/104974998',headers=headers,params=data)  # 添加參數(shù)
print(html.cookies)  # 調(diào)取text屬性查看頁面代碼
for key,value in html.cookies.items():
    print(key+'='+value)

很簡(jiǎn)單，直接獲取cookies屬性即可

維持會(huì)話Session()

在requests中，如果直接利用get()或post()等方法可以做到模擬網(wǎng)頁的請(qǐng)求，但是這實(shí)際上時(shí)相當(dāng)于不同的會(huì)話，相當(dāng)于用了兩個(gè)瀏覽器打開了不同的頁面，這時(shí)需要用session對(duì)象來維護(hù)對(duì)話

import requests  # 導(dǎo)入requests
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.149 Safari/537.36'
}
data = {
    'jl': '765',
    'kw': 'python',
    'kt': '3'
}
html = requests.Session().get('https://blog.csdn.net/qq_40966461/article/details/104974998',headers=headers,params=data)  # 添加參數(shù)
print(html.cookies)  # 調(diào)取text屬性查看頁面代碼
for key,value in html.cookies.items():
    print(key+'='+value)

調(diào)用requests模塊中g(shù)et方法時(shí)先創(chuàng)建一個(gè)Session對(duì)象

SSL證書驗(yàn)證

import requests  # 導(dǎo)入requests
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.149 Safari/537.36'
}
response  = requests.get('http://www.12306.cn',headers=headers,verify = False)
print(response.status_code)

verify=False即可

代理設(shè)置

import requests  # 導(dǎo)入requests
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.149 Safari/537.36'
}
proxies = {
    "http":"http://183.166.132.176",
    "https":"https://183.166.132.176"
}
response  = requests.get('http://www.12306.cn',headers=headers,proxies=proxies,verify = False)
print(response.status_code)

添加proxies即可，代理可以搜索快代理