Python實(shí)現(xiàn)文件下載的方法匯總與適用場(chǎng)景介紹

更新時(shí)間：2025年05月06日 08:21:25 作者：Python_trys

在Python開發(fā)中,文件下載是常見需求,本文將全面介紹10種Python下載文件的方法,每種方法都配有完整代碼示例和適用場(chǎng)景分析,大家可以根據(jù)需要進(jìn)行選擇

1. 使用urllib.request（Python標(biāo)準(zhǔn)庫(kù)）
2. 使用requests庫(kù)（最常用）
3. 使用wget庫(kù)
4. 使用http.client（底層HTTP客戶端）
5. 使用aiohttp（異步下載）
6. 使用pycurl（libcurl綁定）
7. 使用urllib3（requests底層庫(kù)）
8. 使用socket原始下載（僅限高級(jí)用戶）
9. 使用multiprocessing多進(jìn)程下載
10. 使用scrapy（網(wǎng)頁(yè)爬蟲下載）
高級(jí)技巧：斷點(diǎn)續(xù)傳實(shí)現(xiàn)
方法對(duì)比與選擇指南
安全注意事項(xiàng)
總結(jié)

在Python開發(fā)中，文件下載是常見需求。本文將全面介紹10種Python下載文件的方法，涵蓋標(biāo)準(zhǔn)庫(kù)、第三方庫(kù)以及高級(jí)技巧，每種方法都配有完整代碼示例和適用場(chǎng)景分析。

1. 使用urllib.request（Python標(biāo)準(zhǔn)庫(kù)）

適用場(chǎng)景：簡(jiǎn)單下載需求，無需額外安裝庫(kù)

import urllib.request

url = "https://example.com/file.zip"
filename = "downloaded_file.zip"

urllib.request.urlretrieve(url, filename)
print(f"文件已保存為: {filename}")

# 進(jìn)階：添加請(qǐng)求頭
headers = {"User-Agent": "Mozilla/5.0"}
req = urllib.request.Request(url, headers=headers)
with urllib.request.urlopen(req) as response:
    with open(filename, 'wb') as f:
        f.write(response.read())

2. 使用requests庫(kù)（最常用）

適用場(chǎng)景：需要更友好API和高級(jí)功能

import requests

url = "https://example.com/large_file.iso"
filename = "large_file.iso"

# 簡(jiǎn)單下載
response = requests.get(url)
with open(filename, 'wb') as f:
    f.write(response.content)

# 流式下載大文件
with requests.get(url, stream=True) as r:
    r.raise_for_status()
    with open(filename, 'wb') as f:
        for chunk in r.iter_content(chunk_size=8192): 
            f.write(chunk)

3. 使用wget庫(kù)

適用場(chǎng)景：模擬Linux wget命令行為

import wget

url = "https://example.com/image.jpg"
filename = wget.download(url)
print(f"\n下載完成: {filename}")

# 指定保存路徑
wget.download(url, out="/path/to/save/image.jpg")

4. 使用http.client（底層HTTP客戶端）

適用場(chǎng)景：需要底層控制或?qū)W習(xí)HTTP協(xié)議

import http.client

conn = http.client.HTTPSConnection("example.com")
conn.request("GET", "/file.pdf")
response = conn.getresponse()

with open("document.pdf", 'wb') as f:
    f.write(response.read())

conn.close()

5. 使用aiohttp（異步下載）

適用場(chǎng)景：高性能異步下載，I/O密集型任務(wù)

import aiohttp
import asyncio

async def download_file(url, filename):
    async with aiohttp.ClientSession() as session:
        async with session.get(url) as response:
            with open(filename, 'wb') as f:
                while True:
                    chunk = await response.content.read(8192)
                    if not chunk:
                        break
                    f.write(chunk)
    print(f"異步下載完成: {filename}")

urls = [
    ("https://example.com/file1.zip", "file1.zip"),
    ("https://example.com/file2.zip", "file2.zip")
]

async def main():
    tasks = [download_file(url, name) for url, name in urls]
    await asyncio.gather(*tasks)

asyncio.run(main())

6. 使用pycurl（libcurl綁定）

適用場(chǎng)景：需要C級(jí)別性能或復(fù)雜傳輸選項(xiàng)

import pycurl
from io import BytesIO

buffer = BytesIO()
c = pycurl.Curl()
c.setopt(c.URL, "https://example.com/data.json")
c.setopt(c.WRITEDATA, buffer)
c.perform()
c.close()

body = buffer.getvalue()
with open("data.json", 'wb') as f:
    f.write(body)

7. 使用urllib3（requests底層庫(kù)）

適用場(chǎng)景：需要比requests更底層的控制

import urllib3

http = urllib3.PoolManager()
url = "https://example.com/video.mp4"
response = http.request("GET", url, preload_content=False)

with open("video.mp4", 'wb') as f:
    for chunk in response.stream(1024):
        f.write(chunk)

response.release_conn()

8. 使用socket原始下載（僅限高級(jí)用戶）

適用場(chǎng)景：學(xué)習(xí)網(wǎng)絡(luò)原理或特殊協(xié)議需求

import socket

def download_via_socket(url, port=80, filename="output.bin"):
    # 解析URL（簡(jiǎn)化版，實(shí)際應(yīng)使用urllib.parse）
    host = url.split('/')[2]
    path = '/' + '/'.join(url.split('/')[3:])
    
    s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    s.connect((host, port))
    request = f"GET {path} HTTP/1.1\r\nHost: {host}\r\n\r\n"
    s.send(request.encode())
    
    with open(filename, 'wb') as f:
        while True:
            data = s.recv(1024)
            if not data:
                break
            f.write(data)
    s.close()

???????download_via_socket("http://example.com/file")

9. 使用multiprocessing多進(jìn)程下載

適用場(chǎng)景：CPU密集型下載任務(wù)（如需要解壓/加密）

import requests
from multiprocessing import Pool

def download(args):
    url, filename = args
    response = requests.get(url, stream=True)
    with open(filename, 'wb') as f:
        for chunk in response.iter_content(8192):
            f.write(chunk)
    return filename

urls = [
    ("https://example.com/file1.zip", "file1.zip"),
    ("https://example.com/file2.zip", "file2.zip")
]

with Pool(4) as p:  # 4個(gè)進(jìn)程
    results = p.map(download, urls)
    print(f"下載完成: {results}")

10. 使用scrapy（網(wǎng)頁(yè)爬蟲下載）

適用場(chǎng)景：需要從網(wǎng)頁(yè)中批量下載資源

import scrapy
from scrapy.crawler import CrawlerProcess

class FileDownloadSpider(scrapy.Spider):
    name = "filedownload"
    start_urls = ["https://example.com/downloads"]
    
    def parse(self, response):
        for href in response.css('a.download-link::attr(href)').getall():
            yield scrapy.Request(
                response.urljoin(href),
                callback=self.save_file
            )
    
    def save_file(self, response):
        path = response.url.split('/')[-1]
        with open(path, 'wb') as f:
            f.write(response.body)
        self.log(f"保存文件: {path}")

process = CrawlerProcess()
process.crawl(FileDownloadSpider)
process.start()

高級(jí)技巧：斷點(diǎn)續(xù)傳實(shí)現(xiàn)

import requests
import os

def download_with_resume(url, filename):
    headers = {}
    if os.path.exists(filename):
        downloaded = os.path.getsize(filename)
        headers = {'Range': f'bytes={downloaded}-'}
    
    with requests.get(url, headers=headers, stream=True) as r:
        mode = 'ab' if headers else 'wb'
        with open(filename, mode) as f:
            for chunk in r.iter_content(chunk_size=8192):
                f.write(chunk)

download_with_resume("https://example.com/large_file.iso", "large_file.iso")

方法對(duì)比與選擇指南

安全注意事項(xiàng)

驗(yàn)證HTTPS證書：

# requests示例（默認(rèn)驗(yàn)證證書）
requests.get("https://example.com", verify=True)

限制下載大小防止DoS攻擊：

max_size = 1024 * 1024 * 100  # 100MB
response = requests.get(url, stream=True)
downloaded = 0
with open(filename, 'wb') as f:
    for chunk in response.iter_content(8192):
        downloaded += len(chunk)
        if downloaded > max_size:
            raise ValueError("文件超過最大限制")
        f.write(chunk)

清理文件名防止路徑遍歷：

import re
def sanitize_filename(filename):
    return re.sub(r'[\\/*?:"<>|]', "", filename)

總結(jié)

本文介紹了Python下載文件的10種方法，從標(biāo)準(zhǔn)庫(kù)到第三方庫(kù)，從同步到異步，涵蓋了各種應(yīng)用場(chǎng)景。選擇哪種方法取決于你的具體需求：

簡(jiǎn)單需求：urllib.request或requests

高性能需求：aiohttp或pycurl

特殊場(chǎng)景：multiprocessing或scrapy

到此這篇關(guān)于Python實(shí)現(xiàn)文件下載的方法匯總與適用場(chǎng)景介紹的文章就介紹到這了,更多相關(guān)Python文件下載內(nèi)容請(qǐng)搜索腳本之家以前的文章或繼續(xù)瀏覽下面的相關(guān)文章希望大家以后多多支持腳本之家！

您可能感興趣的文章:

欧美bbbwbbbw肥妇,免费乱码人妻系列日韩,一级黄片