pytorch模型訓練的時候GPU使用率不高的問題
前言
博主使用的顯卡配置為:2*RTX 2080Ti,最近在訓練的時候,監(jiān)控顯卡的資源使用情況發(fā)現(xiàn),
雖然同是使用了兩張顯卡,但是每張顯卡的使用率很不穩(wěn)定,貌似是交替使用,這種情況下訓練的速度是很慢的,為了解決
下面是解決這個問題的一些過程。
1. CPU和內存的使用情況
2. 用linux命令查看顯卡資源的使用情況
watch -n 1 nvidia-smi
模型執(zhí)行預測階段 使用顯卡0,但是也只有51%的使用率。
模型在訓練階段,同時使用兩張顯卡,發(fā)現(xiàn)里利用率也不高,我截取的最高的也就60%
3. 在pytorch的文檔中找到了解決辦法
data.DataLoader(dataset: Dataset[T_co], batch_size: Optional[int] = 1, shuffle: bool = False, sampler: Optional[Sampler[int]] = None, batch_sampler: Optional[Sampler[Sequence[int]]] = None, num_workers: int = 0, collate_fn: _collate_fn_t = None, pin_memory: bool = False, drop_last: bool = False, timeout: float = 0, worker_init_fn: _worker_init_fn_t = None, multiprocessing_context=None, generator=None, *, prefetch_factor: int = 2, persistent_workers: bool = False)
上面是該類的輸入?yún)?shù),經(jīng)常使用的用紅色標出,與本文相關的設置用紫色標出,
下面是該類的描述文件:
class DataLoader(Generic[T_co]): r""" Data loader. Combines a dataset and a sampler, and provides an iterable over the given dataset. The :class:`~torch.utils.data.DataLoader` supports both map-style and iterable-style datasets with single- or multi-process loading, customizing loading order and optional automatic batching (collation) and memory pinning. See :py:mod:`torch.utils.data` documentation page for more details. Args: dataset (Dataset): dataset from which to load the data. batch_size (int, optional): how many samples per batch to load (default: ``1``). shuffle (bool, optional): set to ``True`` to have the data reshuffled at every epoch (default: ``False``). sampler (Sampler or Iterable, optional): defines the strategy to draw samples from the dataset. Can be any ``Iterable`` with ``__len__`` implemented. If specified, :attr:`shuffle` must not be specified. batch_sampler (Sampler or Iterable, optional): like :attr:`sampler`, but returns a batch of indices at a time. Mutually exclusive with :attr:`batch_size`, :attr:`shuffle`, :attr:`sampler`, and :attr:`drop_last`. num_workers (int, optional): how many subprocesses to use for data loading. ``0`` means that the data will be loaded in the main process. (default: ``0``) collate_fn (callable, optional): merges a list of samples to form a mini-batch of Tensor(s). Used when using batched loading from a map-style dataset. pin_memory (bool, optional): If ``True``, the data loader will copy Tensors into CUDA pinned memory before returning them. If your data elements are a custom type, or your :attr:`collate_fn` returns a batch that is a custom type, see the example below. drop_last (bool, optional): set to ``True`` to drop the last incomplete batch, if the dataset size is not divisible by the batch size. If ``False`` and the size of dataset is not divisible by the batch size, then the last batch will be smaller. (default: ``False``) timeout (numeric, optional): if positive, the timeout value for collecting a batch from workers. Should always be non-negative. (default: ``0``) worker_init_fn (callable, optional): If not ``None``, this will be called on each worker subprocess with the worker id (an int in ``[0, num_workers - 1]``) as input, after seeding and before data loading. (default: ``None``) prefetch_factor (int, optional, keyword-only arg): Number of samples loaded in advance by each worker. ``2`` means there will be a total of 2 * num_workers samples prefetched across all workers. (default: ``2``) persistent_workers (bool, optional): If ``True``, the data loader will not shutdown the worker processes after a dataset has been consumed once. This allows to maintain the workers `Dataset` instances alive. (default: ``False``) .. warning:: If the ``spawn`` start method is used, :attr:`worker_init_fn` cannot be an unpicklable object, e.g., a lambda function. See :ref:`multiprocessing-best-practices` on more details related to multiprocessing in PyTorch. .. warning:: ``len(dataloader)`` heuristic is based on the length of the sampler used. When :attr:`dataset` is an :class:`~torch.utils.data.IterableDataset`, it instead returns an estimate based on ``len(dataset) / batch_size``, with proper rounding depending on :attr:`drop_last`, regardless of multi-process loading configurations. This represents the best guess PyTorch can make because PyTorch trusts user :attr:`dataset` code in correctly handling multi-process loading to avoid duplicate data. However, if sharding results in multiple workers having incomplete last batches, this estimate can still be inaccurate, because (1) an otherwise complete batch can be broken into multiple ones and (2) more than one batch worth of samples can be dropped when :attr:`drop_last` is set. Unfortunately, PyTorch can not detect such cases in general. See `Dataset Types`_ for more details on these two types of datasets and how :class:`~torch.utils.data.IterableDataset` interacts with `Multi-process data loading`_. .. warning:: See :ref:`reproducibility`, and :ref:`dataloader-workers-random-seed`, and :ref:`data-loading-randomness` notes for random seed related questions. """
發(fā)現(xiàn)如下連個參數(shù)很關鍵:
num_workers (int, optional): how many subprocesses to use for data ? ? loading. ``0`` means that the data will be loaded in the main process. ? ? (default: ``0``)
pin_memory (bool, optional): If ``True``, the data loader will copy Tensors ? ? into CUDA pinned memory before returning them. ?If your data elements ? ? are a custom type, or your :attr:`collate_fn` returns a batch that is a custom type, ? ? see the example below.
把 num_workers = 4,pin_memory = True,發(fā)現(xiàn)效率就上來啦?。?!
只開 num_workers
開 num_workers 和 pin_memory
總結
以上為個人經(jīng)驗,希望能給大家一個參考,也希望大家多多支持腳本之家。
相關文章
python中enumerate() 與zip()函數(shù)的使用比較實例分析
這篇文章主要介紹了python中enumerate()與zip()函數(shù)的使用比較,結合實例形式分析了enumerate()與zip()函數(shù)的功能、用法及操作注意事項,需要的朋友可以參考下2019-09-09Python實現(xiàn)賬號密碼輸錯三次即鎖定功能簡單示例
這篇文章主要介紹了Python實現(xiàn)賬號密碼輸錯三次即鎖定功能,結合實例形式分析了Python文件讀取、流程控制、數(shù)據(jù)判斷等相關操作技巧,需要的朋友可以參考下2019-03-03Python執(zhí)行外部命令subprocess的使用詳解
subeprocess模塊是python自帶的模塊,無需安裝,主要用來取代一些就的模塊或方法,本文通過實例代碼給大家分享Python執(zhí)行外部命令subprocess及使用方法,感興趣的朋友跟隨小編一起看看吧2021-05-05Python統(tǒng)計時間內的并發(fā)數(shù)代碼實例
這篇文章主要介紹了Python統(tǒng)計時間內的并發(fā)數(shù)代碼實例,文中通過示例代碼介紹的非常詳細,對大家的學習或者工作具有一定的參考學習價值,需要的朋友可以參考下2019-12-12pytorch中 gpu與gpu、gpu與cpu 在load時相互轉化操作
這篇文章主要介紹了pytorch模型載入之gpu和cpu互轉操作,具有很好的參考價值,希望對大家有所幫助。一起跟隨小編過來看看吧2020-05-05在Windows中定時執(zhí)行Python腳本的詳細教程
在Windows系統(tǒng)中,定時執(zhí)行Python腳本是一個常見需求,特別是在需要自動化數(shù)據(jù)處理、監(jiān)控任務或周期性維護等場景中,本文將結合實際案例,詳細介紹如何在Windows中通過任務計劃程序(Task Scheduler)來實現(xiàn)定時執(zhí)行Python腳本的功能,需要的朋友可以參考下2024-08-08