快捷導(dǎo)航

python多線程http下載實(shí)現(xiàn)示例

更新時(shí)間：2013年12月30日 09:17:14 作者：

python多線程http下載實(shí)現(xiàn)示例,大家參考使用吧

測(cè)試平臺(tái) Ubuntu 13.04 X86_64 Python 2.7.4

花了將近兩個(gè)小時(shí)，問題主要?jiǎng)傞_始沒有想到傳一個(gè)文件對(duì)象到線程里面去，導(dǎo)致下載下來的文件和源文件MD5不一樣，浪費(fèi)不少時(shí)間.

有興趣的同學(xué)可以拿去加上參數(shù)，改進(jìn)下，也可以加上斷點(diǎn)續(xù)傳.

# -*- coding: utf-8 -*-
# Author: ToughGuy
# Email: wj0630@gmail.com
# 寫這玩意兒是為了初步了解下python的多線程機(jī)制
# 平時(shí)沒寫注釋的習(xí)慣, 這次花時(shí)間在代碼里面寫上注釋也是希望有問題的地方請(qǐng)各位指正, 因?yàn)榭赡芪易约阂矝]弄明白.
# 測(cè)試平臺(tái) Ubuntu 13.04 X86_64 Python 2.7.4

import threading
import urllib2
import sys

max_thread = 10
# 初始化鎖
lock = threading.RLock()

class Downloader(threading.Thread):
    def __init__(self, url, start_size, end_size, fobj, buffer):
        self.url = url
        self.buffer = buffer
        self.start_size = start_size
        self.end_size = end_size
        self.fobj = fobj
        threading.Thread.__init__(self)

    def run(self):
        """
            馬甲而已
        """
        with lock:
            print 'starting: %s' % self.getName()
        self._download()

    def _download(self):
        """
            我才是搬磚的
        """
        req = urllib2.Request(self.url)
        # 添加HTTP Header(RANGE)設(shè)置下載數(shù)據(jù)的范圍
        req.headers['Range'] = 'bytes=%s-%s' % (self.start_size, self.end_size)
        f = urllib2.urlopen(req)
        # 初始化當(dāng)前線程文件對(duì)象偏移量
        offset = self.start_size
        while 1:
            block = f.read(self.buffer)
            # 當(dāng)前線程數(shù)據(jù)獲取完畢后則退出
            if not block:
                with lock:
                    print '%s done.' % self.getName()
                break
            # 寫如數(shù)據(jù)的時(shí)候當(dāng)然要鎖住線程
            # 使用 with lock 替代傳統(tǒng)的 lock.acquire().....lock.release()
            # 需要python >= 2.5
            with lock:
                sys.stdout.write('%s saveing block...' % self.getName())
                # 設(shè)置文件對(duì)象偏移地址
                self.fobj.seek(offset)
                # 寫入獲取到的數(shù)據(jù)
                self.fobj.write(block)
                offset = offset + len(block)
                sys.stdout.write('done.\n')

def main(url, thread=3, save_file='', buffer=1024):
    # 最大線程數(shù)量不能超過max_thread
    thread = thread if thread <= max_thread else max_thread
    # 獲取文件的大小
    req = urllib2.urlopen(url)
    size = int(req.info().getheaders('Content-Length')[0])
    # 初始化文件對(duì)象
    fobj = open(save_file, 'wb')
    # 根據(jù)線程數(shù)量計(jì)算每個(gè)線程負(fù)責(zé)的http Range 大小
    avg_size, pad_size = divmod(size, thread)
    plist = []
    for i in xrange(thread):
        start_size = i*avg_size
        end_size = start_size + avg_size - 1
        if i == thread - 1:
            # 最后一個(gè)線程加上pad_size
            end_size = end_size + pad_size + 1
        t = Downloader(url, start_size, end_size, fobj, buffer)
        plist.append(t)