Python使用Asyncio實現(xiàn)檢查網(wǎng)站狀態(tài)

更新時間：2023年03月30日 14:23:42 作者：冷凍工廠

這篇文章主要為大家詳細介紹了Python如何使用Asyncio實現(xiàn)檢查網(wǎng)站狀態(tài)，文中的示例代碼講解詳細，感興趣的小伙伴可以跟隨小編一起學習一下

1. 如何使用 Asyncio 檢查 HTTP 狀態(tài)

asyncio 模塊提供了對打開套接字連接和通過流讀寫數(shù)據(jù)的支持。我們可以使用此功能來檢查網(wǎng)頁的狀態(tài)。

這可能涉及四個步驟，它們是：

打開一個連接
寫一個請求
讀取響應
關閉連接

2. 打開 HTTP 連接

可以使用 asyncio.open_connection() 函數(shù)在 asyncio 中打開連接。在眾多參數(shù)中，該函數(shù)采用字符串主機名和整數(shù)端口號。

這是一個必須等待的協(xié)程，它返回一個 StreamReader 和一個 StreamWriter，用于使用套接字進行讀寫。

這可用于在端口 80 上打開 HTTP 連接。

...
# open a socket connection
reader, writer = await asyncio.open_connection('www.google.com', 80)

我們還可以使用 ssl=True 參數(shù)打開 SSL 連接。這可用于在端口 443 上打開 HTTPS 連接。

...
# open a socket connection
reader, writer = await asyncio.open_connection('www.google.com', 443)

3. 寫入 HTTP 請求

打開后，我們可以向 StreamWriter 寫入查詢以發(fā)出 HTTP 請求。例如，HTTP 版本 1.1 請求是純文本格式的。我們可以請求文件路徑“/”，它可能如下所示：

GET / HTTP/1.1
Host: www.google.com

重要的是，每行末尾必須有一個回車和一個換行符（\r\n），末尾有一個空行。

作為 Python 字符串，這可能如下所示：

'GET / HTTP/1.1\r\n'
'Host: www.google.com\r\n'
'\r\n'

在寫入 StreamWriter 之前，此字符串必須編碼為字節(jié)。這可以通過對字符串本身使用 encode() 方法來實現(xiàn)。默認的“utf-8”編碼可能就足夠了。

...
# encode string as bytes
byte_data = string.encode()

然后可以通過 StreamWriter 的 write() 方法將字節(jié)寫入套接字。

...
# write query to socket
writer.write(byte_data)

寫入請求后，最好等待字節(jié)數(shù)據(jù)發(fā)送完畢并等待套接字準備就緒。這可以通過 drain() 方法來實現(xiàn)。這是一個必須等待的協(xié)程。

...
# wait for the socket to be ready.
await writer.drain()

4. 讀取 HTTP 響應

發(fā)出 HTTP 請求后，我們可以讀取響應。這可以通過套接字的 StreamReader 來實現(xiàn)?？梢允褂米x取一大塊字節(jié)的 read() 方法或讀取一行字節(jié)的 readline() 方法來讀取響應。

我們可能更喜歡 readline() 方法，因為我們使用的是基于文本的 HTTP 協(xié)議，它一次發(fā)送一行 HTML 數(shù)據(jù)。readline() 方法是協(xié)程，必須等待。

...
# read one line of response
line_bytes = await reader.readline()

HTTP 1.1 響應由兩部分組成，一個由空行分隔的標頭，然后是一個空行終止的主體。header 包含有關請求是否成功以及將發(fā)送什么類型的文件的信息，body 包含文件的內(nèi)容，例如 HTML 網(wǎng)頁。

HTTP 標頭的第一行包含服務器上所請求頁面的 HTTP 狀態(tài)。每行都必須從字節(jié)解碼為字符串。

這可以通過對字節(jié)數(shù)據(jù)使用 decode() 方法來實現(xiàn)。同樣，默認編碼為“utf_8”。

...
# decode bytes into a string
line_data = line_bytes.decode()

5. 關閉 HTTP 連接

我們可以通過關閉 StreamWriter 來關閉套接字連接。這可以通過調(diào)用 close() 方法來實現(xiàn)。

...
# close the connection
writer.close()

這不會阻塞并且可能不會立即關閉套接字?，F(xiàn)在我們知道如何使用 asyncio 發(fā)出 HTTP 請求和讀取響應，讓我們看一些檢查網(wǎng)頁狀態(tài)的示例。

6. 順序檢查 HTTP 狀態(tài)的示例

我們可以開發(fā)一個示例來使用 asyncio 檢查多個網(wǎng)站的 HTTP 狀態(tài)。

在此示例中，我們將首先開發(fā)一個協(xié)程來檢查給定 URL 的狀態(tài)。然后我們將為排名前 10 的網(wǎng)站中的每一個調(diào)用一次這個協(xié)程。

首先，我們可以定義一個協(xié)程，它將接受一個 URL 字符串并返回 HTTP 狀態(tài)。

# get the HTTP/S status of a webpage
async def get_status(url):
	# ...

必須將 URL 解析為其組成部分。我們在發(fā)出 HTTP 請求時需要主機名和文件路徑。我們還需要知道 URL 方案（HTTP 或 HTTPS）以確定是否需要 SSL。

這可以使用 urllib.parse.urlsplit() 函數(shù)來實現(xiàn)，該函數(shù)接受一個 URL 字符串并返回所有 URL 元素的命名元組。

...
# split the url into components
url_parsed = urlsplit(url)

然后我們可以打開基于 URL 方案的 HTTP 連接并使用 URL 主機名。

...
# open the connection
if url_parsed.scheme == 'https':
    reader, writer = await asyncio.open_connection(url_parsed.hostname, 443, ssl=True)
else:
    reader, writer = await asyncio.open_connection(url_parsed.hostname, 80)

接下來，我們可以使用主機名和文件路徑創(chuàng)建 HTTP GET 請求，并使用 StreamWriter 將編碼字節(jié)寫入套接字。

...
# send GET request
query = f'GET {url_parsed.path} HTTP/1.1\r\nHost: {url_parsed.hostname}\r\n\r\n'
# write query to socket
writer.write(query.encode())
# wait for the bytes to be written to the socket
await writer.drain()

接下來，我們可以讀取 HTTP 響應。我們只需要包含 HTTP 狀態(tài)的響應的第一行。

...
# read the single line response
response = await reader.readline()

然后可以關閉連接。

...
# close the connection
writer.close()

最后，我們可以解碼從服務器讀取的字節(jié)、遠程尾隨空白，并返回 HTTP 狀態(tài)。

...
# decode and strip white space
status = response.decode().strip()
# return the response
return status

將它們結合在一起，下面列出了完整的 get_status() 協(xié)程。它沒有任何錯誤處理，例如無法訪問主機或響應緩慢的情況。這些添加將為讀者提供一個很好的擴展。

# get the HTTP/S status of a webpage
async def get_status(url):
    # split the url into components
    url_parsed = urlsplit(url)
    # open the connection
    if url_parsed.scheme == 'https':
        reader, writer = await asyncio.open_connection(url_parsed.hostname, 443, ssl=True)
    else:
        reader, writer = await asyncio.open_connection(url_parsed.hostname, 80)
    # send GET request
    query = f'GET {url_parsed.path} HTTP/1.1\r\nHost: {url_parsed.hostname}\r\n\r\n'
    # write query to socket
    writer.write(query.encode())
    # wait for the bytes to be written to the socket
    await writer.drain()
    # read the single line response
    response = await reader.readline()
    # close the connection
    writer.close()
    # decode and strip white space
    status = response.decode().strip()
    # return the response
    return status

接下來，我們可以為我們要檢查的多個網(wǎng)頁或網(wǎng)站調(diào)用 get_status() 協(xié)程。在這種情況下，我們將定義一個世界排名前 10 的網(wǎng)頁列表。

...
# list of top 10 websites to check
sites = ['https://www.google.com/',
    'https://www.youtube.com/',
    'https://www.facebook.com/',
    'https://twitter.com/',
    'https://www.instagram.com/',
    'https://www.baidu.com/',
    'https://www.wikipedia.org/',
    'https://yandex.ru/',
    'https://yahoo.com/',
    'https://www.whatsapp.com/'
    ]

然后我們可以使用我們的 get_status() 協(xié)程依次查詢每個。在這種情況下，我們將在一個循環(huán)中按順序這樣做，并依次報告每個狀態(tài)。

...
# check the status of all websites
for url in sites:
    # get the status for the url
    status = await get_status(url)
    # report the url and its status
    print(f'{url:30}:\t{status}')

在使用 asyncio 時，我們可以做得比順序更好，但這提供了一個很好的起點，我們可以在以后進行改進。將它們結合在一起，main() 協(xié)程查詢前 10 個網(wǎng)站的狀態(tài)。

# main coroutine
async def main():
    # list of top 10 websites to check
    sites = ['https://www.google.com/',
        'https://www.youtube.com/',
        'https://www.facebook.com/',
        'https://twitter.com/',
        'https://www.instagram.com/',
        'https://www.baidu.com/',
        'https://www.wikipedia.org/',
        'https://yandex.ru/',
        'https://yahoo.com/',
        'https://www.whatsapp.com/'
        ]
    # check the status of all websites
    for url in sites:
        # get the status for the url
        status = await get_status(url)
        # report the url and its status
        print(f'{url:30}:\t{status}')

最后，我們可以創(chuàng)建 main() 協(xié)程并將其用作 asyncio 程序的入口點。

...
# run the asyncio program
asyncio.run(main())

將它們結合在一起，下面列出了完整的示例。

# SuperFastPython.com
# check the status of many webpages
import asyncio
from urllib.parse import urlsplit
 
# get the HTTP/S status of a webpage
async def get_status(url):
    # split the url into components
    url_parsed = urlsplit(url)
    # open the connection
    if url_parsed.scheme == 'https':
        reader, writer = await asyncio.open_connection(url_parsed.hostname, 443, ssl=True)
    else:
        reader, writer = await asyncio.open_connection(url_parsed.hostname, 80)
    # send GET request
    query = f'GET {url_parsed.path} HTTP/1.1\r\nHost: {url_parsed.hostname}\r\n\r\n'
    # write query to socket
    writer.write(query.encode())
    # wait for the bytes to be written to the socket
    await writer.drain()
    # read the single line response
    response = await reader.readline()
    # close the connection
    writer.close()
    # decode and strip white space
    status = response.decode().strip()
    # return the response
    return status
 
# main coroutine
async def main():
    # list of top 10 websites to check
    sites = ['https://www.google.com/',
        'https://www.youtube.com/',
        'https://www.facebook.com/',
        'https://twitter.com/',
        'https://www.instagram.com/',
        'https://www.baidu.com/',
        'https://www.wikipedia.org/',
        'https://yandex.ru/',
        'https://yahoo.com/',
        'https://www.whatsapp.com/'
        ]
    # check the status of all websites
    for url in sites:
        # get the status for the url
        status = await get_status(url)
        # report the url and its status
        print(f'{url:30}:\t{status}')
 
# run the asyncio program
asyncio.run(main())

運行示例首先創(chuàng)建 main() 協(xié)程并將其用作程序的入口點。main() 協(xié)程運行，定義前 10 個網(wǎng)站的列表。然后順序遍歷網(wǎng)站列表。 main()協(xié)程掛起調(diào)用get_status()協(xié)程查詢一個網(wǎng)站的狀態(tài)。

get_status() 協(xié)程運行、解析 URL 并打開連接。它構造一個 HTTP GET 查詢并將其寫入主機。讀取、解碼并返回響應。main() 協(xié)程恢復并報告 URL 的 HTTP 狀態(tài)。

對列表中的每個 URL 重復此操作。該程序大約需要 5.6 秒才能完成，或者平均每個 URL 大約需要半秒。這突出了我們?nèi)绾问褂?asyncio 來查詢網(wǎng)頁的 HTTP 狀態(tài)。

盡管如此，它并沒有充分利用 asyncio 來并發(fā)執(zhí)行任務。

https://www.google.com/ :   HTTP/1.1 200 OK
https://www.youtube.com/ :   HTTP/1.1 200 OK
https://www.facebook.com/ :   HTTP/1.1 302 Found
https://twitter.com/ :   HTTP/1.1 200 OK
https://www.instagram.com/ :   HTTP/1.1 200 OK
https://www.baidu.com/ :   HTTP/1.1 200 OK
https://www.wikipedia.org/ :   HTTP/1.1 200 OK
https://yandex.ru/ :   HTTP/1.1 302 Moved temporarily
https://yahoo.com/ :   HTTP/1.1 301 Moved Permanently
https://www.whatsapp.com/ :   HTTP/1.1 302 Found

7. 并發(fā)查看網(wǎng)站狀態(tài)示例

asyncio 的一個好處是我們可以同時執(zhí)行許多協(xié)程。我們可以使用 asyncio.gather() 函數(shù)在 asyncio 中并發(fā)查詢網(wǎng)站的狀態(tài)。

此函數(shù)采用一個或多個協(xié)程，暫停執(zhí)行提供的協(xié)程，并將每個協(xié)程的結果作為可迭代對象返回。然后我們可以遍歷 URL 列表和可迭代的協(xié)程返回值并報告結果。

這可能是比上述方法更簡單的方法。首先，我們可以創(chuàng)建一個協(xié)程列表。

...
# create all coroutine requests
coros = [get_status(url) for url in sites]

接下來，我們可以執(zhí)行協(xié)程并使用 asyncio.gather() 獲取可迭代的結果。

請注意，我們不能直接提供協(xié)程列表，而是必須將列表解壓縮為單獨的表達式，這些表達式作為位置參數(shù)提供給函數(shù)。

...
# execute all coroutines and wait
results = await asyncio.gather(*coros)

這將同時執(zhí)行所有協(xié)程并檢索它們的結果。然后我們可以遍歷 URL 列表和返回狀態(tài)并依次報告每個。

...
# process all results
for url, status in zip(sites, results):
    # report status
    print(f'{url:30}:\t{status}')

將它們結合在一起，下面列出了完整的示例。

# SuperFastPython.com
# check the status of many webpages
import asyncio
from urllib.parse import urlsplit
 
# get the HTTP/S status of a webpage
async def get_status(url):
    # split the url into components
    url_parsed = urlsplit(url)
    # open the connection
    if url_parsed.scheme == 'https':
        reader, writer = await asyncio.open_connection(url_parsed.hostname, 443, ssl=True)
    else:
        reader, writer = await asyncio.open_connection(url_parsed.hostname, 80)
    # send GET request
    query = f'GET {url_parsed.path} HTTP/1.1\r\nHost: {url_parsed.hostname}\r\n\r\n'
    # write query to socket
    writer.write(query.encode())
    # wait for the bytes to be written to the socket
    await writer.drain()
    # read the single line response
    response = await reader.readline()
    # close the connection
    writer.close()
    # decode and strip white space
    status = response.decode().strip()
    # return the response
    return status
 
# main coroutine
async def main():
    # list of top 10 websites to check
    sites = ['https://www.google.com/',
        'https://www.youtube.com/',
        'https://www.facebook.com/',
        'https://twitter.com/',
        'https://www.instagram.com/',
        'https://www.baidu.com/',
        'https://www.wikipedia.org/',
        'https://yandex.ru/',
        'https://yahoo.com/',
        'https://www.whatsapp.com/'
        ]
    # create all coroutine requests
    coros = [get_status(url) for url in sites]
    # execute all coroutines and wait
    results = await asyncio.gather(*coros)
    # process all results
    for url, status in zip(sites, results):
        # report status
        print(f'{url:30}:\t{status}')
 
# run the asyncio program
asyncio.run(main())

運行該示例會像以前一樣執(zhí)行 main() 協(xié)程。在這種情況下，協(xié)程列表是在列表理解中創(chuàng)建的。

然后調(diào)用 asyncio.gather() 函數(shù)，傳遞協(xié)程并掛起 main() 協(xié)程，直到它們?nèi)客瓿?。協(xié)程執(zhí)行，同時查詢每個網(wǎng)站并返回它們的狀態(tài)。

main() 協(xié)程恢復并接收可迭代的狀態(tài)值。然后使用 zip() 內(nèi)置函數(shù)遍歷此可迭代對象和 URL 列表，并報告狀態(tài)。

這突出了一種更簡單的方法來同時執(zhí)行協(xié)程并在所有任務完成后報告結果。它也比上面的順序版本更快，在我的系統(tǒng)上完成大約 1.4 秒。

https://www.google.com/ :   HTTP/1.1 200 OK
https://www.youtube.com/ :   HTTP/1.1 200 OK
https://www.facebook.com/ :   HTTP/1.1 302 Found
https://twitter.com/ :   HTTP/1.1 200 OK
https://www.instagram.com/ :   HTTP/1.1 200 OK
https://www.baidu.com/ :   HTTP/1.1 200 OK
https://www.wikipedia.org/ :   HTTP/1.1 200 OK
https://yandex.ru/ :   HTTP/1.1 302 Moved temporarily
https://yahoo.com/ :   HTTP/1.1 301 Moved Permanently
https://www.whatsapp.com/ :   HTTP/1.1 302 Found

以上就是Python使用Asyncio實現(xiàn)檢查網(wǎng)站狀態(tài)的詳細內(nèi)容，更多關于Python Asyncio檢查網(wǎng)站狀態(tài)的資料請關注腳本之家其它相關文章！

您可能感興趣的文章:

openCV實踐項目之銀行卡卡號識別功能
最近在惡補opencv,在前期不太那么認真的學習狀態(tài)下,著手搞了一下這個小項目實戰(zhàn),基于模板匹配下的銀行卡卡號識別,下面這篇文章主要給大家介紹了關于openCV實踐項目之銀行卡卡號識別功能的相關資料,需要的朋友可以參考下
2022-11-11
Python和Ruby中each循環(huán)引用變量問題（一個隱秘BUG?）
這篇文章主要介紹了Python和Ruby中each循環(huán)引用變量問題,類似PHP的foreach中使用引用變量的問題,需要的朋友可以參考下
2014-06-06
為什么選擇python編程語言入門黑客攻防給你幾個理由!
為什么選擇python編程語言入門黑客攻防，小編今天給你幾個理由!Python語言的優(yōu)點、Python黑客攻擊優(yōu)點，具有一定的參考價值，感興趣的小伙伴們可以參考一下
2018-02-02
使用Python matplotlib作圖時,設置橫縱坐標軸數(shù)值以百分比(%)顯示
這篇文章主要介紹了使用Python matplotlib作圖時,設置橫縱坐標軸數(shù)值以百分比(%)顯示，具有很好的參考價值，希望對大家有所幫助。一起跟隨小編過來看看吧
2020-05-05
Python實現(xiàn)讀取excel中的圖片功能
這篇文章主要介紹了如何利用Python實現(xiàn)讀取Excel中的圖片的功能，文中的實現(xiàn)步驟講解詳細，對我們學習Python有一定幫助，需要的可以參考一下
2022-01-01
Pycharm Terminal 與Project interpreter 安裝
本文主要介紹了Pycharm Terminal 與Project interpreter 安裝包不同步問題解決，文中通過示例代碼介紹的非常詳細，對大家的學習或者工作具有一定的參考學習價值，需要的朋友們下面隨著小編來一起學習學習吧
2023-02-02
python的pip安裝以及使用教程
這篇文章主要為大家詳細介紹了python的pip安裝以及使用教程，具有一定的參考價值，感興趣的小伙伴們可以參考一下
2018-09-09
Matlab、Python為工具解析數(shù)據(jù)可視化之美
下面介紹一些數(shù)據(jù)可視化的作品（包含部分代碼），主要是地學領域，可遷移至其他學科，本文通過實例代碼給大家介紹的非常詳細，對大家的學習或工作具有一定的參考借鑒價值，需要的朋友參考下吧
2021-11-11
python實現(xiàn)的批量分析xml標簽中各個類別個數(shù)功能示例
這篇文章主要介紹了python實現(xiàn)的批量分析xml標簽中各個類別個數(shù)功能,涉及Python針對xml文件的遍歷、讀取、解析等相關操作技巧,需要的朋友可以參考下
2019-12-12
從零開始學習Python與BeautifulSoup網(wǎng)頁數(shù)據(jù)抓取
想要從零開始學習Python和BeautifulSoup網(wǎng)頁數(shù)據(jù)抓??？本指南將為你提供簡單易懂的指導,讓你掌握這兩個強大的工具,不管你是初學者還是有經(jīng)驗的開發(fā)者,本指南都能幫助你快速入門并提升技能,不要錯過這個機會,開始你的編程之旅吧！
2024-01-01