快捷導(dǎo)航

Python異步爬蟲requests和aiohttp中代理IP的使用

更新時(shí)間：2022年03月02日 09:32:56 作者：Dream丶Killer

本文主要介紹了Python異步爬蟲requests和aiohttp中代理IP的使用，文中通過示例代碼介紹的非常詳細(xì)，具有一定的參考價(jià)值，感興趣的小伙伴們可以參考一下

爬蟲要想爬的好，IP代理少不了?！，F(xiàn)在網(wǎng)站基本都有些反爬措施，訪問速度稍微快點(diǎn)，就會(huì)發(fā)現(xiàn)IP被封，不然就是提交驗(yàn)證。下面就兩種常用的模塊來講一下代理IP的使用方式。話不多說，直接開始。

requests中代理IP的使用：

requests中使用代理IP只需要添加一個(gè)proxies參數(shù)即可。proxies的參數(shù)值是一個(gè)字典，key是代理協(xié)議（http/https），value就是ip和端口號(hào)，具體格式如下。

try:
    response = requests.get('https://httpbin.org/ip', headers=headers, 
    	proxies={'https':'https://221.122.91.74:9401'}, timeout=6)
    print('success')
    # 檢測代理IP是否使用成功
    # 第一種方式，返回發(fā)送請(qǐng)求的IP地址，使用時(shí)要在 get() 添加 stream = True
    # print(response.raw._connection.sock.getpeername()[0])
    # 第二種方式,直接返回測試網(wǎng)站的響應(yīng)數(shù)據(jù)的內(nèi)容
    print(response.text)
except Exception as e:
    print('error',e)

在這里插入圖片描述

注意： peoxies的key值（http/https）要和url一致，不然會(huì)直接使用本機(jī)IP直接訪問。

aiohttp中代理IP的使用：

由于requests模塊不支持異步，迫不得已使用aiohttp，掉了不少坑。
它的使用方式和requests相似，也是在get()方法中添加一個(gè)參數(shù)，但此時(shí)的參數(shù)名為proxy，參數(shù)值是字符串，且字符串中的代理協(xié)議，只支持http，寫成https會(huì)報(bào)錯(cuò)。
這里記錄一下我的糾錯(cuò)歷程。。
首先根據(jù)網(wǎng)上的使用方式，我先試了一下下面的代碼。

async def func():
    async with aiohttp.ClientSession() as session:
        try:
            async with session.get("https://httpbin.org/ip", headers=headers, 
            			proxy='http://183.220.145.3:80', timeout=6) as response:
                page_text = await response.text()
                print('success')
                print(page_text)
        except Exception as e:
            print(e)
            print('error')

if __name__=='__main__':
    asyncio.run(func())

在這里插入圖片描述

修改后，再來

async def func():
    con = aiohttp.TCPConnector(verify_ssl=False)
    async with aiohttp.ClientSession(connector=aiohttp.TCPConnector(verify_ssl=False)) as session:
        try:
            async with session.get("https://httpbin.org/ip", headers=headers, 
            proxy='http://183.220.145.3:80', timeout=6) as response:
                # print(response.raw._connection.sock.getpeername()[0])
                page_text = await response.text()
                print(page_text)
                print('success')
        except Exception as e:
            print(e)
            print('error')

在這里插入圖片描述

非但沒有解決反倒多了一個(gè)警告，好在改一下就好。額~懶得粘了，直接來最終版本吧。。

# 修改事件循環(huán)的策略，不能放在協(xié)程函數(shù)內(nèi)部，這條語句要先執(zhí)行
asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy())
async def func():
	# 添加trust_env=True
    async with aiohttp.ClientSession(connector=aiohttp.TCPConnector(ssl=False), trust_env=True) as session:
        try:
            async with session.get("https://httpbin.org/ip", headers=headers,
             proxy='http://183.220.145.3:80', timeout=10) as response:
                page_text = await response.text()
                print(page_text)
                print('success')
        except Exception as e:
            print(e)
            print('error')

在這里插入圖片描述