快捷導(dǎo)航

pytorch使用過程中遇到的錯(cuò)誤處理之內(nèi)存溢出問題

更新時(shí)間：2023年09月08日 09:04:59 作者：great-wind

這篇文章主要介紹了pytorch使用過程中遇到的錯(cuò)誤處理之內(nèi)存溢出問題,具有很好的參考價(jià)值,希望對(duì)大家有所幫助,如有錯(cuò)誤或未考慮完全的地方,望不吝賜教

內(nèi)存溢出

在使用 pytorch 訓(xùn)練的模型進(jìn)行推理操作時(shí)，

出現(xiàn)以下錯(cuò)誤：

RuntimeError: CUDA out of memory. Tried to allocate 416.00 MiB (GPU 0; 2.00 GiB total capacity; 1.32 GiB already allocated; 0 bytes free; 1.34 GiB reserved in total by PyTorch)

從上述報(bào)錯(cuò)信息中可以看出， GPU0 共有 2GiB 容量，已經(jīng)分配出去 1.32 GiB ， 0 bytes 可用，PyTorch占用 1.34 GiB 。

使用下述命令查看GPU的使用情況：

> nvidia-smi
Wed Jul 13 15:20:18 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 512.95       Driver Version: 512.95       CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name            TCC/WDDM | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ... WDDM  | 00000000:01:00.0 Off |                  N/A |
| N/A   39C    P0    N/A /  N/A |      0MiB /  2048MiB |      2%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

發(fā)現(xiàn)并沒有進(jìn)程占用GPU資源。

然后使用 torch 包內(nèi)的命令查看內(nèi)存占用情況，

結(jié)果如下：

> print(torch.cuda.memory.memory_summary())
|===========================================================================|
|                  PyTorch CUDA memory summary, device ID 0                 |
|---------------------------------------------------------------------------|
|            CUDA OOMs: 0            |        cudaMalloc retries: 0         |
|===========================================================================|
|        Metric         | Cur Usage  | Peak Usage | Tot Alloc  | Tot Freed  |
|---------------------------------------------------------------------------|
| Allocated memory      |       0 B  |       0 B  |       0 B  |       0 B  |
|       from large pool |       0 B  |       0 B  |       0 B  |       0 B  |
|       from small pool |       0 B  |       0 B  |       0 B  |       0 B  |
|---------------------------------------------------------------------------|
| Active memory         |       0 B  |       0 B  |       0 B  |       0 B  |
|       from large pool |       0 B  |       0 B  |       0 B  |       0 B  |
|       from small pool |       0 B  |       0 B  |       0 B  |       0 B  |
|---------------------------------------------------------------------------|
| GPU reserved memory   |       0 B  |       0 B  |       0 B  |       0 B  |
|       from large pool |       0 B  |       0 B  |       0 B  |       0 B  |
|       from small pool |       0 B  |       0 B  |       0 B  |       0 B  |
|---------------------------------------------------------------------------|
| Non-releasable memory |       0 B  |       0 B  |       0 B  |       0 B  |
|       from large pool |       0 B  |       0 B  |       0 B  |       0 B  |
|       from small pool |       0 B  |       0 B  |       0 B  |       0 B  |
|---------------------------------------------------------------------------|
| Allocations           |       0    |       0    |       0    |       0    |
|       from large pool |       0    |       0    |       0    |       0    |
|       from small pool |       0    |       0    |       0    |       0    |
|---------------------------------------------------------------------------|
| Active allocs         |       0    |       0    |       0    |       0    |
|       from large pool |       0    |       0    |       0    |       0    |
|       from small pool |       0    |       0    |       0    |       0    |
|---------------------------------------------------------------------------|
| GPU reserved segments |       0    |       0    |       0    |       0    |
|       from large pool |       0    |       0    |       0    |       0    |
|       from small pool |       0    |       0    |       0    |       0    |
|---------------------------------------------------------------------------|
| Non-releasable allocs |       0    |       0    |       0    |       0    |
|       from large pool |       0    |       0    |       0    |       0    |
|       from small pool |       0    |       0    |       0    |       0    |
|===========================================================================|

從結(jié)果中看到，沒有內(nèi)存被占用。

再次運(yùn)行代碼依舊報(bào)錯(cuò)，難道是代碼自身所需的內(nèi)存過大而導(dǎo)致失??？

但是我們的代碼只是推理代碼，不應(yīng)該占用這么高的內(nèi)存，經(jīng)過查詢，發(fā)現(xiàn)在推理模型時(shí)，應(yīng)該在主代碼部分添加torch.no_grad()以防止推理過程中對(duì)梯度進(jìn)行追蹤。

追蹤梯度時(shí)會(huì)占用大量的內(nèi)存。

解決辦法

如下：

with torch.no_grad():
    outputs = model(samples) #主代碼

總結(jié)

以上為個(gè)人經(jīng)驗(yàn)，希望能給大家一個(gè)參考，也希望大家多多支持腳本之家。

您可能感興趣的文章:

欧美bbbwbbbw肥妇,免费乱码人妻系列日韩,一级黄片

pytorch使用過程中遇到的錯(cuò)誤處理之內(nèi)存溢出問題

目錄

內(nèi)存溢出

解決辦法

總結(jié)

相關(guān)文章

最新評(píng)論

大家感興趣的內(nèi)容

最近更新的內(nèi)容

常用在線小工具