快捷導(dǎo)航

Python調(diào)用IDM進(jìn)行批量下載的實(shí)現(xiàn)

更新時間：2025年04月22日 11:43:11 作者：GIS炒茄子

本文主要介紹了Python調(diào)用IDM進(jìn)行批量下載的實(shí)現(xiàn),文中通過示例代碼介紹的非常詳細(xì),對大家的學(xué)習(xí)或者工作具有一定的參考學(xué)習(xí)價值,需要的朋友們下面隨著小編來一起學(xué)習(xí)學(xué)習(xí)吧

01 解決的問題

1.1 存儲下載鏈接的txt文件

很顯然，問題在于IDM已經(jīng)可以導(dǎo)入txt文件進(jìn)行批量下載，為什么還需要呢？

第一：IDM對于大批量的下載鏈接(實(shí)際體驗(yàn)大于5000個鏈接就已經(jīng)非常卡頓無法移動鼠標(biāo)和操作IDM)的txt文件是全部導(dǎo)入，即使對于高性能的筆記本也沒法抗??；
第二：IDM并沒有很好的解決對于文件中斷的監(jiān)測，尤其是大批量，而使用自定義的DownloadManager類可以從中斷處繼續(xù)調(diào)用IDM下載；

1.2 循環(huán)添加下載鏈接

如果你需要是申請下載鏈接，然后再利用IDM下載所需文件，那么或許手動操作比較繁瑣，這里關(guān)于DownloadManager有一些方法可以稍微解決你的問題；

02 代碼

2.1 IDM調(diào)用命令

從IDM幫助可以獲取得到, IDM可以通過CMD命令行調(diào)用IDM下載，DownloadManager本質(zhì)上就是循環(huán)調(diào)用IDM進(jìn)行鏈接文件的下載：

IDM命令行說明:
cmd: idman /s
/s: 開始(start)下載添加IDM中下載隊(duì)列中的所有文件

cmd: idman /d URL [/p 本地_路徑] [/f 本地_文件_名] [/q] [/h] [/n] [/a]
/d URL: 從下載鏈接url中下載文件
/p 本地_路徑: 下載好的文件保存在哪個本地路徑(文件夾路徑/目錄)
/f 本地_文件_名: 下載好的文件輸出/保存的文件名稱
/q: IDM 將在成功下載之后退出。這個參數(shù)只為第一個副本工作
/h: IDM 將在正常下載之后掛起您的連接(下載窗口最小化/隱藏到系統(tǒng)托盤)
/n: IDM不要詢問任何問題不要彈窗,安靜地/后臺地下載
/a: 添加一個指定的文件, 用/d到下載隊(duì)列, 但是不要開始下載.(即添加一個下載鏈接到IDM的下載隊(duì)列中, 可通過/s啟動隊(duì)列的所有下載鏈接文件的下載)

2.2 DownloadManager類

# @Author  : ChaoQiezi
# @Time    : 2025/3/31 下午5:12
# @Email   : chaoqiezi.one@qq.com
# @FileName: dead_code

"""
This script is used to 用于管理IDM進(jìn)行批量下載
"""

import os
import time
from pathlib import Path
import json
from urllib.parse import urlparse
from tqdm import tqdm
from subprocess import call


class DownloadManager:
    def __init__(self, out_dir, idm_path, links_path=None, status_path=None, concurrent_downloads=16,
                 monitor_interval=1):
        """
        初始化類
        :param out_dir: 下載文件的輸出目錄
        :param idm_path: idman.exe的絕對路徑, eg: "D:\Softwares\IDM\Internet Download Manager\IDMan.exe"
        :param links_path: 存儲下載鏈接的txt文件(一行一個下載鏈接)
        :param status_path: 存儲結(jié)構(gòu)化下載鏈接的json文件(用于存儲下載鏈接和狀態(tài)的json文件)
        :param concurrent_downloads: 同時下載文件數(shù)量
        :param monitor_interval: 監(jiān)測下載事件的時間間隔,對于大文件:監(jiān)測時間可適當(dāng)延長
        """

        # 存儲下載狀態(tài)的json文件
        if status_path is None:
            status_path = os.path.join(Path(__file__).parent, 'links_status.json')
        self.status_path = status_path
        # 下載文件的輸出路徑
        if not os.path.exists(out_dir):
            os.makedirs(out_dir)
        self.out_dir = out_dir
        # 下載狀態(tài)
        self.downloading_links = list()
        self.pending_links = list()
        self.completed_links = list()
        self.links = list()
        self.pbar = None  # 下載進(jìn)度條, 執(zhí)行self.download()時觸發(fā)
        # 下載參數(shù)
        self.idm_path = idm_path  # IDM軟件的絕對路徑
        self.concurrent_downloads = concurrent_downloads  # 同時下載文件數(shù)量(并發(fā)量)
        self.monitor_interval = monitor_interval  # 監(jiān)測下載事件的時間間隔, 單位:秒/s
        self.downloaded_count = len(self.completed_links)  # 已下載數(shù)
        self.remaining_downloads = len(self.links) - self.downloaded_count  # 未下載數(shù)
        self.link_count = len(self.links)
        self.bar_format = "{desc}: {percentage:.0f}%|{bar}| [{n_fmt}/{total_fmt}] [已用時間:{elapsed}, 剩余時間:{remaining}, {postfix}]"

        # 初始化下載狀態(tài)
        if links_path is not None:  # 將存儲下載鏈接的txt文件存儲為結(jié)構(gòu)化json文件
            self._init_save(links_path)
        elif os.path.exists(self.status_path):
            with open(self.status_path, 'r') as f:
                links_status = json.load(f)
                self.downloading_links = links_status['downloading_links']
                self.pending_links = links_status['pending_links']
                self.completed_links = links_status['completed_links']
                self.links = links_status['links']
                self._update()
        else:
            self._update()

    def _init_save(self, links_path):
        """
        從存儲下載鏈接的txt文件中初始化下載鏈接及其下載狀態(tài)等參數(shù)
        :param links_path: 存儲下載鏈接的txt文件
        :return: None
        """
        with open(links_path, 'r') as f:
            urls = []
            for line in f:
                if not line.startswith('http'):
                    continue
                urls.append({
                    'url': line.rstrip('\n'),
                    'filename': self._get_filename(line.rstrip('\n'))
                })

        self.links = urls.copy()
        self.pending_links = urls.copy()
        """
        # 必須使用copy(), 否則后續(xù)對self.pending_links中元素操作, 會影響self.links的元素, 因?yàn)槎弑举|(zhì)上都是指向(id相同)同一個列表urls
        self.links = urls
        self.pending_links = urls
        """

        self._update()

    def _update(self, downloading_links=None, pending_links=None, completed_links=None, links=None):
        """更新下載鏈接的狀態(tài)位置并保存"""
        if downloading_links is None:
            downloading_links = self.downloading_links
        if pending_links is None:
            pending_links = self.pending_links
        if completed_links is None:
            completed_links = self.completed_links
        if links is None:
            links = self.links

        self.downloaded_count = len(self.completed_links)
        self.remaining_downloads = len(self.links) - self.downloaded_count
        self.link_count = len(self.links)

        with open(self.status_path, 'w') as f:
            json.dump({
                'downloading_links': downloading_links,
                'pending_links': pending_links,
                'completed_links': completed_links,
                'links': links
            }, f, indent=4)  # indent=4表示縮進(jìn)為4,讓排版更美觀

    def add_link(self, link: str, filename=None):
        """
        添加新鏈接
        :param link: 需要添加的一個鏈接
        :param filename: 該鏈接對應(yīng)下載文件的輸出文件名
        :return: None
        """

        # 結(jié)構(gòu)化下載鏈接
        new_item = self._generate_item(link, filename)

        # 添加下載鏈接到links
        if new_item not in self.links:
            self.links.append(new_item)
            self.pending_links.append(new_item)

        self._update()

    def _get_filename(self, url):
        """獲取下載鏈接url對應(yīng)的默認(rèn)文件名稱"""
        return os.path.basename(urlparse(url).path)

    def _generate_item(self, link: str, filename=None):
        """基于下載鏈接生成item"""
        item = {
            'url': link,
        }
        if filename is not None:
            item['filename'] = filename
        else:
            item['filename'] = self._get_filename(link)

        return item

    def _init_download(self):
        """
        初始化下載鏈接的狀態(tài)并啟動下載
        :return:
        """
        # self.links復(fù)制一份到pending_links中
        self.pending_links = self.links.copy()

        self._pending2downloading()  # 將<等待下載隊(duì)列>中的鏈接添加到<正在下載隊(duì)列>去


    def download(self):
        """
        對此前加入的所有url進(jìn)行下載
        :return:
        """
        try:
            self.pbar = tqdm(total=self.link_count, desc='下載', bar_format=self.bar_format, colour='blue')
            self._init_download()
            self._monitor()
        except KeyboardInterrupt:
            print('您已中斷下載程序; 下次下載將繼續(xù)從({}/{})處下載...'.format(self.downloaded_count, self.link_count))
        except Exception as e:
            print('下載異常錯誤: {};\n下次下載將繼續(xù)從({}/{})處下載...'.format(e, self.downloaded_count, self.link_count))
        finally:
            self._update()  # 無論是否發(fā)生異常, 最后都必須保存當(dāng)前下載狀態(tài), 以備下次下載繼續(xù)從斷開處進(jìn)行
            exit(1)  # 錯誤退出

    def download_single(self, url, filename=None, wait_time=None):
        """
        對輸入的單個url進(jìn)行下載, 最好不要與download()方法連用
        :param url: 所需下載的文件鏈接
        :param filename: 輸出的文件名稱
        :return:
        """

        if filename is None:
            filename = self._get_filename(url)

        # 判斷當(dāng)前url文件是否已經(jīng)下載
        out_path = os.path.join(self.out_dir, filename)
        if os.path.exists(out_path):
            if wait_time is not None:
                return wait_time

        call([self.idm_path, '/d', url, '/p', self.out_dir, '/f', filename, '/a', '/n'])
        call([self.idm_path, '/s'])

        if wait_time is not None:
            return wait_time + 0
        """
        IDM命令行說明:
        cmd: idman /s
        /s: 開始(start)下載添加IDM中下載隊(duì)列中的所有文件
        cmd: idman /d URL [/p 本地_路徑] [/f 本地_文件_名] [/q] [/h] [/n] [/a]
        /d URL: 從下載鏈接url中下載文件
        /p 本地_路徑: 下載好的文件保存在哪個本地路徑(文件夾路徑/目錄)
        /f 本地_文件_名: 下載好的文件輸出/保存的文件名稱
        /q: IDM 將在成功下載之后退出。這個參數(shù)只為第一個副本工作
        /h: IDM 將在正常下載之后掛起您的連接(下載窗口最小化/隱藏到系統(tǒng)托盤)
        /n: IDM不要詢問任何問題不要彈窗,安靜地/后臺地下載
        /a: 添加一個指定的文件, 用/d到下載隊(duì)列, 但是不要開始下載.(即添加一個下載鏈接到IDM的下載隊(duì)列中, 可通過/s啟動隊(duì)列的所有下載鏈接文件的下載)
        """

    def _monitor(self):
        while True:
            for item in self.downloading_links.copy():  # .copy()是為了防止在循環(huán)過程中一邊迭代downloading_links一邊刪除其中元素
                self._check_update_download(item)
            self._update()  # 更新和保存下載狀態(tài)
            self.pbar.refresh()  # 更新下載進(jìn)度條狀態(tài)
            call([self.idm_path, '/s'])  # 防止IDM意外停止下載

            # 直到等待下載鏈接和正在下載鏈接中均無下載鏈接說明下載完畢.
            if not self.pending_links and not self.downloading_links:
                self.pbar.close()  # 關(guān)閉下載進(jìn)度條
                print('所有鏈接均下載完畢.')
                break

            time.sleep(self.monitor_interval)

    def _check_update_download(self, downloading_item):
        """
        檢查當(dāng)前項(xiàng)是否已經(jīng)下載, 成功下載則更新該項(xiàng)的狀態(tài)并返回True, 否則不操作并返回False
        :param downloading_item: <正在下載鏈接>中的當(dāng)前項(xiàng)
        :return: Bool
        """

        out_path = os.path.join(self.out_dir, downloading_item['filename'])

        # 檢查當(dāng)前文件是否存在(是否下載)
        if os.path.exists(out_path):  # 存在(即已經(jīng)下載過了)
            # 更新當(dāng)前文件的下載狀態(tài)
            self.completed_links.append(downloading_item)
            self.downloading_links.remove(downloading_item)
            self._update_pbar(downloading_item['filename'])  # 更新下載進(jìn)度條
            # print('文件: {} - 下載完成({}/{})'.format(downloading_item['filename'], len(self.completed_links), len(self.links)))
            # 從<阻塞/等待下載鏈接>中取鏈接到<正在下載鏈接>中(如果pending_links中還有鏈接)
            if self.pending_links:
                self._pending2downloading()  # 取<阻塞/等待下載鏈接>中的鏈接添加到<正在下載鏈接>中

            return True
        return False

    def _download(self, item):
        self.download_single(item['url'], item['filename'])

    def _pending2downloading(self):
        """
        從阻塞的<等待下載鏈接>中取鏈接<正在下載鏈接>中,若所取鏈接已經(jīng)下載則跳過
        :return:
        """

        for item in self.pending_links.copy():
            out_path = os.path.join(self.out_dir, item['filename'])
            # 判斷當(dāng)前下載鏈接是否已經(jīng)被下載
            if os.path.exists(out_path):  # 若當(dāng)前鏈接已經(jīng)下載, 跳過下載并更新其狀態(tài)
                self.pending_links.remove(item)
                self.completed_links.append(item)
                self._update_pbar(item['filename'])
                continue
            elif self.downloading_links.__len__() < self.concurrent_downloads:  # 若當(dāng)前鏈接未被下載且當(dāng)前下載數(shù)量小于并發(fā)量
                self.pending_links.remove(item)
                self.downloading_links.append(item)
                self._download(item)
            else:
                # 若elif中不能執(zhí)行, 說明當(dāng)前項(xiàng)未下載, 且當(dāng)前同時下載的文件數(shù)量已達(dá)到最大, 因此不需要迭代下去了
                break


    def should_add_link(self, item=None, url=None, filename=None):
        """
        依據(jù)item/url/filename判斷該鏈接此前已經(jīng)被添加過, 如果添加過那么返回False, 如果沒有被添加過則返回True
        :param item: 形如dict{'url': str, 'filename': str}的item
        :param url: 包含單個下載鏈接的字符串
        :param filename: 包含輸出文件名稱的字符串
        :return: Bool
        """

        if not self.links:
            return True, {}

        # 依據(jù)item判斷
        if item is not None:
            for cur_item in self.links:
                if cur_item == item:
                    return False, item
            return True, {}

        # 依據(jù)鏈接判斷
        if url is not None:
            for item in self.links:
                if item['url'] == url:
                    return False, item
            return True, {}

        # 依據(jù)輸出文件名稱判斷
        if filename is not None:
            for item in self.links:
                if item['filename'] == filename:
                    return False, item
            return True, {}

    def _update_pbar(self, filename):
        """
        更新下載進(jìn)度條
        :return:
        """

        self.pbar.n = len(self.completed_links)  # 更新已完成地數(shù)目
        self.pbar.set_postfix_str('當(dāng)前下載文件: {}'.format(filename))
        # self.pbar.refresh()  # 立即刷新顯示

2.3 基本使用

如果對于類和調(diào)用不太了解，使用前請參照下面步驟進(jìn)行操作：

將上述代碼復(fù)制在一個空的Python文件中，重命名為DownloadManager.py;
在DownloadManager.py所在目錄/文件夾下載新建一個.py文件（不妨命名為links_download.py），用于下載文件
運(yùn)行links_download.py如果未指定下載狀態(tài)文件的存儲路徑，會在同目錄下生成links_status.json，在下載沒有完成時不要刪除該文件

批量下載代碼界面

2.3.1 下載鏈接的txt文件的批量下載

from links_download import DownloadManager

out_dir = r'E:\MyTEMP'
idm_path = r"D:\Softwares\IDM\Internet Download Manager\IDMan.exe"
links_path = r'F:\PyProJect\GPP\Resources\MyTEMP\links.txt'
downloader = DownloadManager(out_dir, idm_path=idm_path, links_path=links_path)
downloader.download()

下載界面如下，IDM是靜默下載的，需要查看手動打開IDM即可：

代碼運(yùn)行時

IDM下載界面

2.3.2 單個文件下載

from DownloadManager import DownloadManager

out_dir = r'E:\MyTEMP\go'
idm_path = r"D:\Softwares\IDM\Internet Download Manager\IDMan.exe"
url = 'https://bandisoft.okconnect.cn/bandizip/BANDIZIP-SETUP-STD-X64.EXE'
downloader = DownloadManager(out_dir, idm_path=idm_path)
downloader.add_link(url, 'xxx.exe')  # 若不指定輸出文件名稱則鏈接指定的默認(rèn)名稱
downloader.download()

注意: DownloadManager(out_dir, idm_path=idm_path)中輸出路徑out_dir和IDM軟件的絕對路徑idm_path一定在最初始化的時候就要指定，否則報錯.

2.4 示例

2.4.1 批量下載ERA5文件(循環(huán)添加下載鏈接)

注意，下面兩個示例py文件代碼，其中
import Configfrom Config import my_key, my_urlfrom Src.utils import generate_request, DownloadManager這是自定義模塊，請參考源碼中的正文部分即可，對于引用這部分代碼或者方法請忽略或者替換.

# @Author  : ChaoQiezi
# @Time    : 2025/3/27 上午10:56
# @Email   : chaoqiezi.one@qq.com
# @FileName: era5_download_idm

"""
This script is used to 通過IDM多線程下載ERA5數(shù)據(jù)集

正常下載是通過cdsapi模塊進(jìn)行era5數(shù)據(jù)集的下載,
但是cdsapi本身下載有限制, 特別是從國內(nèi)進(jìn)行下載, 通過IDM多線程下載可以將原先的200KB/S提高至5~10MB/S,
極大提高下載速度.

計劃方案
- 需要限制下載文件數(shù)量(文件下載數(shù)量過多, 全部加載到IDM中可能導(dǎo)致IDM卡死, 亦或者由于下載時間過長導(dǎo)致末端請求的下載鏈接過期<ERA5僅有一天有效期>)
- 定期檢查下載好的文件(由于網(wǎng)絡(luò)異常等原因,導(dǎo)致可能文件下載異常, 因此需要檢查文件是否下載完成)
- 存儲下載鏈接和下載是否完成的json文件
"""

import os
import time
import cdsapi
from datetime import datetime
from dateutil.relativedelta import relativedelta
from tqdm import tqdm

import Config
from Config import my_key, my_url
from Src.utils import generate_request, DownloadManager

# 準(zhǔn)備
out_dir = r'G:\ERA5'  # 輸出nc文件的路徑(自行修改)
dataset_name = "reanalysis-era5-land"  # era5-land再分析數(shù)據(jù)集名稱
start_date = datetime(2000, 1, 1)
end_date = datetime(2010, 12, 31)
var_names = ["2m_temperature", "2m_dewpoint_temperature", "surface_solar_radiation_downwards"]

# 鏈接cdsapi客戶端
c = cdsapi.Client(url=my_url, key=my_key)
out_link_dir = os.path.join(Config.root_dir, 'Resources', 'era5_links_download')
if not os.path.exists(out_link_dir):
    os.mkdir(out_link_dir)
# 獲取下載鏈接
rd = relativedelta(end_date, start_date)
months = rd.years * 12 + rd.months + 1  # 計算總共的月份數(shù)
for var_index, var_name in enumerate(var_names):
    # 初始化當(dāng)前變量下載的狀態(tài)
    cur_out_dir = os.path.join(out_dir, var_name)
    cur_links_filename = 'era5_links_{}_{}_{}.json'.format(var_name,
                                                           start_date.strftime('%Y%m%d'), end_date.strftime('%Y%m%d'))
    storage_path = os.path.join(Config.Resources_dir, 'era5_links_download', cur_links_filename)
    downloader = DownloadManager(cur_out_dir, status_path=storage_path)

    wait_time = 0
    # 迭代獲取當(dāng)前月份的下載鏈接
    for month in range(months):
        cur_date = start_date + relativedelta(months=month)  # months參數(shù)用于設(shè)置增加或減少月份, 而month參數(shù)用于設(shè)置具體月份
        out_filename = '{}_{}_{:02}.nc'.format(var_name, cur_date.year, cur_date.month)
        try:
            # 判斷當(dāng)前鏈接是否已經(jīng)請求(避免重復(fù)請求下載浪費(fèi)時間和請求次數(shù))
            add_bool, item = downloader.should_add_link(filename=out_filename)
            if add_bool:
                # 獲取下載請求
                request = generate_request(var_name, cur_date)
                cur_url = c.retrieve(dataset_name, request).location

                # 添加下載鏈接
                downloader.add_link(cur_url, out_filename)
                item = downloader._generate_item(cur_url, out_filename)
            print('已添加下載鏈接({}/{}): {}-{}-{:02}'.format(month+1 + 132 * var_index, months * len(var_names), var_name, cur_date.year, cur_date.month))

            if cur_date.month == 8 and cur_date.year == 2010:
                print(123)
            wait_time = downloader.download_single(item['url'], item['filename'], wait_time=wait_time)
            print('正在下載: {}'.format(item['filename']))
            if (month + 1) % 12 == 0:
                print('等待中({}s)...'.format(wait_time))
                time.sleep(wait_time)
                wait_time = 0
            # 每隔12個月利用cdsapi模塊獲取下載請求并開始下載, 避免長時間獲取下載請求達(dá)到限制或者下載請求過期.
        except Exception as e:
            print('當(dāng)前下載{}錯誤: {}'.format(out_filename, e))
        finally:  # 無論是否發(fā)生錯誤, 都進(jìn)行下一次循環(huán)的下載
            continue

2.4.2 批量下載ERA5文件(txt文件下載)

# @Author  : ChaoQiezi
# @Time    : 2025/3/31 上午11:00
# @Email   : chaoqiezi.one@qq.com
# @FileName: nasa_download_idm

"""
This script is used to 測試NASA相關(guān)數(shù)據(jù)的下載
"""


import os
import cdsapi
from datetime import datetime
from dateutil.relativedelta import relativedelta

import Config
from Config import my_key, my_url
from Src.utils import generate_request, DownloadManager

links_path = r'F:\PyProJect\GPP\Resources\MyTEMP\nasa_links.txt'
downloader = DownloadManager(out_dir=r'F:\PyProJect\GPP\Resources\MyTEMP\nasa', links_path=links_path, monitor_interval=10)
downloader.download()

2.5 使用說明

暫時沒有時間對類做太多說明，可以自行探索和優(yōu)化代碼

到此這篇關(guān)于Python調(diào)用IDM進(jìn)行批量下載的實(shí)現(xiàn)的文章就介紹到這了,更多相關(guān)Python調(diào)用IDM批量下載內(nèi)容請搜索腳本之家以前的文章或繼續(xù)瀏覽下面的相關(guān)文章希望大家以后多多支持腳本之家！

您可能感興趣的文章: