Python實(shí)現(xiàn)文件下載的方法匯總與適用場(chǎng)景介紹
在Python開發(fā)中,文件下載是常見需求。本文將全面介紹10種Python下載文件的方法,涵蓋標(biāo)準(zhǔn)庫(kù)、第三方庫(kù)以及高級(jí)技巧,每種方法都配有完整代碼示例和適用場(chǎng)景分析。
1. 使用urllib.request(Python標(biāo)準(zhǔn)庫(kù))
適用場(chǎng)景:簡(jiǎn)單下載需求,無需額外安裝庫(kù)
import urllib.request
url = "https://example.com/file.zip"
filename = "downloaded_file.zip"
urllib.request.urlretrieve(url, filename)
print(f"文件已保存為: {filename}")
# 進(jìn)階:添加請(qǐng)求頭
headers = {"User-Agent": "Mozilla/5.0"}
req = urllib.request.Request(url, headers=headers)
with urllib.request.urlopen(req) as response:
with open(filename, 'wb') as f:
f.write(response.read())
2. 使用requests庫(kù)(最常用)
適用場(chǎng)景:需要更友好API和高級(jí)功能
import requests
url = "https://example.com/large_file.iso"
filename = "large_file.iso"
# 簡(jiǎn)單下載
response = requests.get(url)
with open(filename, 'wb') as f:
f.write(response.content)
# 流式下載大文件
with requests.get(url, stream=True) as r:
r.raise_for_status()
with open(filename, 'wb') as f:
for chunk in r.iter_content(chunk_size=8192):
f.write(chunk)
3. 使用wget庫(kù)
適用場(chǎng)景:模擬Linux wget命令行為
import wget
url = "https://example.com/image.jpg"
filename = wget.download(url)
print(f"\n下載完成: {filename}")
# 指定保存路徑
wget.download(url, out="/path/to/save/image.jpg")
4. 使用http.client(底層HTTP客戶端)
適用場(chǎng)景:需要底層控制或?qū)W習(xí)HTTP協(xié)議
import http.client
conn = http.client.HTTPSConnection("example.com")
conn.request("GET", "/file.pdf")
response = conn.getresponse()
with open("document.pdf", 'wb') as f:
f.write(response.read())
conn.close()
5. 使用aiohttp(異步下載)
適用場(chǎng)景:高性能異步下載,I/O密集型任務(wù)
import aiohttp
import asyncio
async def download_file(url, filename):
async with aiohttp.ClientSession() as session:
async with session.get(url) as response:
with open(filename, 'wb') as f:
while True:
chunk = await response.content.read(8192)
if not chunk:
break
f.write(chunk)
print(f"異步下載完成: {filename}")
urls = [
("https://example.com/file1.zip", "file1.zip"),
("https://example.com/file2.zip", "file2.zip")
]
async def main():
tasks = [download_file(url, name) for url, name in urls]
await asyncio.gather(*tasks)
asyncio.run(main())
6. 使用pycurl(libcurl綁定)
適用場(chǎng)景:需要C級(jí)別性能或復(fù)雜傳輸選項(xiàng)
import pycurl
from io import BytesIO
buffer = BytesIO()
c = pycurl.Curl()
c.setopt(c.URL, "https://example.com/data.json")
c.setopt(c.WRITEDATA, buffer)
c.perform()
c.close()
body = buffer.getvalue()
with open("data.json", 'wb') as f:
f.write(body)
7. 使用urllib3(requests底層庫(kù))
適用場(chǎng)景:需要比requests更底層的控制
import urllib3
http = urllib3.PoolManager()
url = "https://example.com/video.mp4"
response = http.request("GET", url, preload_content=False)
with open("video.mp4", 'wb') as f:
for chunk in response.stream(1024):
f.write(chunk)
response.release_conn()
8. 使用socket原始下載(僅限高級(jí)用戶)
適用場(chǎng)景:學(xué)習(xí)網(wǎng)絡(luò)原理或特殊協(xié)議需求
import socket
def download_via_socket(url, port=80, filename="output.bin"):
# 解析URL(簡(jiǎn)化版,實(shí)際應(yīng)使用urllib.parse)
host = url.split('/')[2]
path = '/' + '/'.join(url.split('/')[3:])
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect((host, port))
request = f"GET {path} HTTP/1.1\r\nHost: {host}\r\n\r\n"
s.send(request.encode())
with open(filename, 'wb') as f:
while True:
data = s.recv(1024)
if not data:
break
f.write(data)
s.close()
???????download_via_socket("http://example.com/file")9. 使用multiprocessing多進(jìn)程下載
適用場(chǎng)景:CPU密集型下載任務(wù)(如需要解壓/加密)
import requests
from multiprocessing import Pool
def download(args):
url, filename = args
response = requests.get(url, stream=True)
with open(filename, 'wb') as f:
for chunk in response.iter_content(8192):
f.write(chunk)
return filename
urls = [
("https://example.com/file1.zip", "file1.zip"),
("https://example.com/file2.zip", "file2.zip")
]
with Pool(4) as p: # 4個(gè)進(jìn)程
results = p.map(download, urls)
print(f"下載完成: {results}")
10. 使用scrapy(網(wǎng)頁(yè)爬蟲下載)
適用場(chǎng)景:需要從網(wǎng)頁(yè)中批量下載資源
import scrapy
from scrapy.crawler import CrawlerProcess
class FileDownloadSpider(scrapy.Spider):
name = "filedownload"
start_urls = ["https://example.com/downloads"]
def parse(self, response):
for href in response.css('a.download-link::attr(href)').getall():
yield scrapy.Request(
response.urljoin(href),
callback=self.save_file
)
def save_file(self, response):
path = response.url.split('/')[-1]
with open(path, 'wb') as f:
f.write(response.body)
self.log(f"保存文件: {path}")
process = CrawlerProcess()
process.crawl(FileDownloadSpider)
process.start()
高級(jí)技巧:斷點(diǎn)續(xù)傳實(shí)現(xiàn)
import requests
import os
def download_with_resume(url, filename):
headers = {}
if os.path.exists(filename):
downloaded = os.path.getsize(filename)
headers = {'Range': f'bytes={downloaded}-'}
with requests.get(url, headers=headers, stream=True) as r:
mode = 'ab' if headers else 'wb'
with open(filename, mode) as f:
for chunk in r.iter_content(chunk_size=8192):
f.write(chunk)
download_with_resume("https://example.com/large_file.iso", "large_file.iso")
方法對(duì)比與選擇指南

安全注意事項(xiàng)
驗(yàn)證HTTPS證書:
# requests示例(默認(rèn)驗(yàn)證證書)
requests.get("https://example.com", verify=True)
限制下載大小防止DoS攻擊:
max_size = 1024 * 1024 * 100 # 100MB
response = requests.get(url, stream=True)
downloaded = 0
with open(filename, 'wb') as f:
for chunk in response.iter_content(8192):
downloaded += len(chunk)
if downloaded > max_size:
raise ValueError("文件超過最大限制")
f.write(chunk)
清理文件名防止路徑遍歷:
import re
def sanitize_filename(filename):
return re.sub(r'[\\/*?:"<>|]', "", filename)
總結(jié)
本文介紹了Python下載文件的10種方法,從標(biāo)準(zhǔn)庫(kù)到第三方庫(kù),從同步到異步,涵蓋了各種應(yīng)用場(chǎng)景。選擇哪種方法取決于你的具體需求:
簡(jiǎn)單需求:urllib.request或requests
高性能需求:aiohttp或pycurl
特殊場(chǎng)景:multiprocessing或scrapy
到此這篇關(guān)于Python實(shí)現(xiàn)文件下載的方法匯總與適用場(chǎng)景介紹的文章就介紹到這了,更多相關(guān)Python文件下載內(nèi)容請(qǐng)搜索腳本之家以前的文章或繼續(xù)瀏覽下面的相關(guān)文章希望大家以后多多支持腳本之家!
相關(guān)文章
Python合并Excel表(多sheet)的實(shí)現(xiàn)
這篇文章主要介紹了Python合并Excel表(多sheet)的實(shí)現(xiàn),文中通過示例代碼介紹的非常詳細(xì),對(duì)大家的學(xué)習(xí)或者工作具有一定的參考學(xué)習(xí)價(jià)值,需要的朋友們下面隨著小編來一起學(xué)習(xí)學(xué)習(xí)吧2021-04-04
Python異步執(zhí)行CMD命令的具體實(shí)現(xiàn)
異步執(zhí)行CMD命令是提高Python程序性能的有效方法,本文就來介紹一下Python異步執(zhí)行CMD命令的具體實(shí)現(xiàn),具有一定的參考價(jià)值,感興趣的可以了解一下2024-05-05
Python ORM框架SQLAlchemy學(xué)習(xí)筆記之安裝和簡(jiǎn)單查詢實(shí)例
這篇文章主要介紹了Python ORM框架SQLAlchemy學(xué)習(xí)筆記之安裝和簡(jiǎn)單查詢實(shí)例,簡(jiǎn)明入門教程,需要的朋友可以參考下2014-06-06
python矩陣/字典實(shí)現(xiàn)最短路徑算法
這篇文章主要為大家詳細(xì)介紹了python矩陣/字典實(shí)現(xiàn)最短路徑算法,具有一定的參考價(jià)值,感興趣的小伙伴們可以參考一下2019-01-01
python中的單引號(hào)雙引號(hào)區(qū)別知識(shí)點(diǎn)總結(jié)
在本篇文章中小編給大家整理了關(guān)于python中的單引號(hào)雙引號(hào)有什么區(qū)別的相關(guān)知識(shí)點(diǎn)以及實(shí)例代碼,需要的朋友們參考下。2019-06-06
python爬蟲實(shí)現(xiàn)POST request payload形式的請(qǐng)求
這篇文章主要介紹了python爬蟲實(shí)現(xiàn)POST request payload形式的請(qǐng)求,具有很好的參考價(jià)值,希望對(duì)大家有所幫助。一起跟隨小編過來看看吧2020-04-04

