欧美bbbwbbbw肥妇,免费乱码人妻系列日韩,一级黄片

淺談Tensorflow2對GPU內(nèi)存的分配策略

 更新時間:2021年08月12日 16:07:50   作者:無風(fēng)聽海  
本文主要介紹了Tensorflow2對GPU內(nèi)存的分配策略,文中通過示例代碼介紹的非常詳細(xì),具有一定的參考價值,感興趣的小伙伴們可以參考一下

一、問題源起

從以下的異常堆棧可以看到是BLAS程序集初始化失敗,可以看到是執(zhí)行MatMul的時候發(fā)生的異常,基本可以斷定可能數(shù)據(jù)集太大導(dǎo)致memory不夠用了。

2021-08-10 16:38:04.917501: E tensorflow/stream_executor/cuda/cuda_blas.cc:226] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED
2021-08-10 16:38:04.960048: E tensorflow/stream_executor/cuda/cuda_blas.cc:226] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED
2021-08-10 16:38:04.986898: E tensorflow/stream_executor/cuda/cuda_blas.cc:226] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED
2021-08-10 16:38:04.992366: E tensorflow/stream_executor/cuda/cuda_blas.cc:226] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED
2021-08-10 16:38:04.992389: W tensorflow/stream_executor/stream.cc:1455] attempting to perform BLAS operation using StreamExecutor without BLAS support
Traceback (most recent call last):
  File "/home/mango/PycharmProjects/DeepLearing/minist_conv.py", line 32, in <module>
    model.fit(train_images, train_labels, epochs=5, batch_size=64)
  File "/usr/local/lib/python3.9/dist-packages/tensorflow/python/keras/engine/training.py", line 1183, in fit
    tmp_logs = self.train_function(iterator)
  File "/usr/local/lib/python3.9/dist-packages/tensorflow/python/eager/def_function.py", line 889, in __call__
    result = self._call(*args, **kwds)
  File "/usr/local/lib/python3.9/dist-packages/tensorflow/python/eager/def_function.py", line 950, in _call
    return self._stateless_fn(*args, **kwds)
  File "/usr/local/lib/python3.9/dist-packages/tensorflow/python/eager/function.py", line 3023, in __call__
    return graph_function._call_flat(
  File "/usr/local/lib/python3.9/dist-packages/tensorflow/python/eager/function.py", line 1960, in _call_flat
    return self._build_call_outputs(self._inference_function.call(
  File "/usr/local/lib/python3.9/dist-packages/tensorflow/python/eager/function.py", line 591, in call
    outputs = execute.execute(
  File "/usr/local/lib/python3.9/dist-packages/tensorflow/python/eager/execute.py", line 59, in quick_execute
    tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.InternalError:  Blas xGEMM launch failed : a.shape=[1,64,576], b.shape=[1,576,64], m=64, n=64, k=576
  [[node sequential/dense/MatMul (defined at home/mango/PycharmProjects/DeepLearing/minist_conv.py:32) ]] [Op:__inference_train_function_993]

Function call stack:
train_function

二、開發(fā)環(huán)境

mango@mango-ubuntu:~$ /usr/local/cuda/bin/nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Wed_Jul_14_19:41:19_PDT_2021
Cuda~~ compilation tools, release 11.4, V11.4.100==
Build cuda_11.4.r11.4/compiler.30188945_0

mango@mango-ubuntu:~$ tail -n 10 /usr/include/cudnn_version.h
#ifndef CUDNN_VERSION_H_
#define CUDNN_VERSION_H_

#define CUDNN_MAJOR 8
#define CUDNN_MINOR 2
#define CUDNN_PATCHLEVEL 2

#define CUDNN_VERSION (CUDNN_MAJOR * 1000 + CUDNN_MINOR * 100 + CUDNN_PATCHLEVEL)

#endif /* CUDNN_VERSION_H */

mango@mango-ubuntu:~$ python3 --version
Python 3.9.5

mango@mango-ubuntu:~$ nvidia-smi
Tue Aug 10 19:57:58 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.57.02    Driver Version: 470.57.02    CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:01:00.0 Off |                  N/A |
| N/A   54C    P0    N/A /  N/A |    329MiB /  2002MiB |      9%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1818      G   /usr/lib/xorg/Xorg                186MiB |
|    0   N/A  N/A      2002      G   /usr/bin/gnome-shell               45MiB |
|    0   N/A  N/A      3435      G   ...AAAAAAAAA= --shared-files       75MiB |
|    0   N/A  N/A      6016      G   python3                            13MiB |
+-----------------------------------------------------------------------------+

 

mango@mango-ubuntu:~$ python3
Python 3.9.5 (default, May 11 2021, 08:20:37) 
[GCC 10.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
2021-08-10 18:33:05.917520: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
>>> tf.__version__
'2.5.0'
>>> 

三、Tensorflow針對GPU內(nèi)存的分配策略

By default, TensorFlow maps nearly all of the GPU memory of all GPUs (subject to CUDA_VISIBLE_DEVICES) visible to the process. This is done to more efficiently use the relatively precious GPU memory resources on the devices by reducing memory fragmentation.

默認(rèn)情況下,為了通過減少內(nèi)存碎片更有效地利用設(shè)備上相對寶貴的GPU內(nèi)存資源,TensorFlow進(jìn)程會使用所有可見的GPU。

In some cases it is desirable for the process to only allocate a subset of the available memory, or to only grow the memory usage as is needed by the process. TensorFlow provides two methods to control this.

在某些情況下,進(jìn)程只分配可用內(nèi)存的一個子集,或者只根據(jù)進(jìn)程的需要增加內(nèi)存使用量。TensorFlow提供了兩種方法來控制這種情況。

The first option is to turn on memory growth by calling tf.config.experimental.set_memory_growth, which attempts to allocate only as much GPU memory as needed for the runtime allocations: it starts out allocating very little memory, and as the program gets run and more GPU memory is needed, the GPU memory region is extended for the TensorFlow process. Memory is not released since it can lead to memory fragmentation. To turn on memory growth for a specific GPU, use the following code prior to allocating any tensors or executing any ops.

第一種選擇是通過調(diào)用tf.config.experimental.set_memory_growth來打開內(nèi)存增長,它嘗試只分配運(yùn)行時所需的GPU內(nèi)存:它開始分配很少的內(nèi)存,當(dāng)程序運(yùn)行時需要更多的GPU內(nèi)存時,GPU內(nèi)存區(qū)域會進(jìn)一步擴(kuò)展增大。內(nèi)存不會被釋放,因?yàn)檫@會導(dǎo)致內(nèi)存碎片。為了打開特定GPU的內(nèi)存增長,在分配任何張量或執(zhí)行任何操作之前,使用以下代碼。

gpus = tf.config.list_physical_devices('GPU')
if gpus:
  try:
    # Currently, memory growth needs to be the same across GPUs
    for gpu in gpus:
      tf.config.experimental.set_memory_growth(gpu, True)
    logical_gpus = tf.config.list_logical_devices('GPU')
    print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
  except RuntimeError as e:
    # Memory growth must be set before GPUs have been initialized
    print(e)

Another way to enable this option is to set the environmental variable TF_FORCE_GPU_ALLOW_GROWTH to true. This configuration is platform specific.

啟用該選項(xiàng)的另一種方法是將環(huán)境變量TF_FORCE_GPU_ALLOW_GROWTH設(shè)置為true。此配置是特定于平臺的。

The second method is to configure a virtual GPU device with tf.config.experimental.set_virtual_device_configuration and set a hard limit on the total memory to allocate on the GPU.

This is useful if you want to truly bound the amount of GPU memory available to the TensorFlow process. This is common practice for local development when the GPU is shared with other applications such as a workstation GUI.

第二種方法是使用tf.config.experimental.set_virtual_device_configuration配置虛擬GPU設(shè)備,并設(shè)置GPU上可分配的總內(nèi)存的硬限制。

如果你想真正將GPU內(nèi)存的數(shù)量綁定到TensorFlow進(jìn)程中,這是非常有用的。當(dāng)GPU與其他應(yīng)用程序(如工作站GUI)共享時,這是本地開發(fā)的常見做法。

gpus = tf.config.list_physical_devices('GPU')
if gpus:
  # Restrict TensorFlow to only allocate 1GB of memory on the first GPU
  try:
    tf.config.set_logical_device_configuration(
        gpus[0],
        [tf.config.LogicalDeviceConfiguration(memory_limit=1024)])
    logical_gpus = tf.config.list_logical_devices('GPU')
    print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
  except RuntimeError as e:
    # Virtual devices must be set before GPUs have been initialized
    print(e)

四、問題分析驗(yàn)證

通過上邊對TensorFlow文檔的分析,默認(rèn)情況下會占用所有的GPU內(nèi)存,但是TensorFlow提供了兩種方式可以靈活的控制內(nèi)存的分配策略;

我們可以直接設(shè)置GPU內(nèi)存按需動態(tài)分配

import tensorflow as tf
physical_gpus = tf.config.list_physical_devices('GPU')
tf.config.experimental.set_memory_growth(physical_gpus[0], True)

通過以下命令可以看到執(zhí)行過程中GPU內(nèi)存的占用最高為697M

mango@mango-ubuntu:~$ while true; do nvidia-smi; sleep 0.2; done;
Tue Aug 10 20:30:58 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.57.02    Driver Version: 470.57.02    CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:01:00.0 Off |                  N/A |
| N/A   58C    P0    N/A /  N/A |   1026MiB /  2002MiB |     72%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1818      G   /usr/lib/xorg/Xorg                186MiB |
|    0   N/A  N/A      2002      G   /usr/bin/gnome-shell               45MiB |
|    0   N/A  N/A      3435      G   ...AAAAAAAAA= --shared-files       73MiB |
|    0   N/A  N/A      6016      G   python3                            13MiB |
|    0   N/A  N/A     13829      C   /usr/bin/python3.9                697MiB |
+-----------------------------------------------------------------------------+

我們也可以限制最多使用1024M的GPU內(nèi)存

import tensorflow as tf
physical_gpus = tf.config.list_physical_devices('GPU')
tf.config.set_logical_device_configuration(physical_gpus[0], [tf.config.LogicalDeviceConfiguration(memory_limit=1024)])

同樣通過命令可以看到執(zhí)行過程中GPU內(nèi)存的占用最高為1455M

mango@mango-ubuntu:~$ while true; do nvidia-smi; sleep 0.2; done;
Tue Aug 10 20:31:24 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.57.02    Driver Version: 470.57.02    CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:01:00.0 Off |                  N/A |
| N/A   58C    P0    N/A /  N/A |   1784MiB /  2002MiB |     74%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1818      G   /usr/lib/xorg/Xorg                186MiB |
|    0   N/A  N/A      2002      G   /usr/bin/gnome-shell               46MiB |
|    0   N/A  N/A      3435      G   ...AAAAAAAAA= --shared-files       72MiB |
|    0   N/A  N/A      6016      G   python3                            13MiB |
|    0   N/A  N/A     13570      C   /usr/bin/python3.9               1455MiB |
+-----------------------------------------------------------------------------+

五、GPU分配策略分析

通過四中的測試結(jié)果可得

  • 默認(rèn)的分配策略會占用所有的內(nèi)存,并且執(zhí)行中不會進(jìn)行釋放,如果訓(xùn)練數(shù)據(jù)量比較打很容易內(nèi)存不夠用;
  • 限制最大使用內(nèi)存,測試占用內(nèi)存比設(shè)置的大,這個可能跟訓(xùn)練中間使用的模型和操作的復(fù)雜程度有關(guān)系,需要根據(jù)具體的業(yè)務(wù)場景設(shè)置合適的值;但是要注意不能設(shè)置大了,否則還是會報錯,但是設(shè)置小了只是執(zhí)行的慢一些罷了;
  • 設(shè)置內(nèi)存按需分配可能是一個相對比較中庸的方案,感覺可能是一個更好的方案,不知道TensorFlow為什么沒有設(shè)置為默認(rèn)值,留作一個問題,后續(xù)有新的認(rèn)知的話再補(bǔ)充;

六、擴(kuò)展

單GPU模擬多GPU環(huán)境

當(dāng)我們的本地開發(fā)環(huán)境只有一個GPU,但卻需要編寫多GPU的程序在工作站上進(jìn)行訓(xùn)練任務(wù)時,TensorFlow為我們提供了一個方便的功能,可以讓我們在本地開發(fā)環(huán)境中建立多個模擬GPU,從而讓多GPU的程序調(diào)試變得更加方便。以下代碼在實(shí)體GPU GPU:0 的基礎(chǔ)上建立了兩個顯存均為2GB的虛擬GPU。

gpus = tf.config.list_physical_devices('GPU')
if gpus:
  # Create 2 virtual GPUs with 1GB memory each
  try:
    tf.config.set_logical_device_configuration(
        gpus[0],
        [tf.config.LogicalDeviceConfiguration(memory_limit=1024),
         tf.config.LogicalDeviceConfiguration(memory_limit=1024)])
    logical_gpus = tf.config.list_logical_devices('GPU')
    print(len(gpus), "Physical GPU,", len(logical_gpus), "Logical GPUs")
  except RuntimeError as e:
    # Virtual devices must be set before GPUs have been initialized
    print(e)

多GPU的數(shù)據(jù)并行

使用 tf.distribute.Strategy可以將模型拷貝到每個GPU上,然后將訓(xùn)練數(shù)據(jù)分批在不同的GPU上執(zhí)行,達(dá)到數(shù)據(jù)并行。

tf.debugging.set_log_device_placement(True)
gpus = tf.config.list_logical_devices('GPU')
strategy = tf.distribute.MirroredStrategy(gpus)
with strategy.scope():
  inputs = tf.keras.layers.Input(shape=(1,))
  predictions = tf.keras.layers.Dense(1)(inputs)
  model = tf.keras.models.Model(inputs=inputs, outputs=predictions)
  model.compile(loss='mse',
                optimizer=tf.keras.optimizers.SGD(learning_rate=0.2))

到此這篇關(guān)于淺談Tensorflow2對GPU內(nèi)存的分配策略的文章就介紹到這了,更多相關(guān)Tensorflow2 GPU內(nèi)存分配內(nèi)容請搜索腳本之家以前的文章或繼續(xù)瀏覽下面的相關(guān)文章希望大家以后多多支持腳本之家!

相關(guān)文章

  • python try 異常處理(史上最全)

    python try 異常處理(史上最全)

    為了處理異常,我們使用try...except,這篇文章主要介紹了python try 異常處理,小編覺得挺不錯的,現(xiàn)在分享給大家,也給大家做個參考。一起跟隨小編過來看看吧
    2019-03-03
  • django修改models重建數(shù)據(jù)庫的操作

    django修改models重建數(shù)據(jù)庫的操作

    這篇文章主要介紹了django修改models重建數(shù)據(jù)庫的操作,具有很好的參考價值,希望對大家有所幫助。一起跟隨小編過來看看吧
    2020-03-03
  • Python 自動化表單提交實(shí)例代碼

    Python 自動化表單提交實(shí)例代碼

    今天以一個表單的自動提交,來進(jìn)一步學(xué)習(xí)selenium的用法,非常不錯,具有參考借鑒價值,需要的朋友參考下吧
    2017-06-06
  • Matplotlib繪圖基礎(chǔ)之3D圖形繪制詳解

    Matplotlib繪圖基礎(chǔ)之3D圖形繪制詳解

    matplotlib 在1.0版本之前其實(shí)是不支持3D圖形繪制的,后來的版本中,matplotlib加入了3D圖形的支持,擴(kuò)展了其展示數(shù)據(jù)分布和關(guān)系的能力,下面就和大家介紹一下matplotlib中繪制各類3D圖形的方法
    2023-08-08
  • Python股票數(shù)據(jù)可視化代碼詳解

    Python股票數(shù)據(jù)可視化代碼詳解

    這篇文章主要為大家詳細(xì)介紹了Python股票數(shù)據(jù)可視化,文中示例代碼介紹的非常詳細(xì),具有一定的參考價值,感興趣的小伙伴們可以參考一下,希望能夠給你帶來幫助
    2022-03-03
  • python實(shí)現(xiàn)批量下載新浪博客的方法

    python實(shí)現(xiàn)批量下載新浪博客的方法

    這篇文章主要介紹了python實(shí)現(xiàn)批量下載新浪博客的方法,涉及Python頁面抓取的相關(guān)實(shí)現(xiàn)技巧,需要的朋友可以參考下
    2015-06-06
  • Python數(shù)據(jù)處理利器Slice函數(shù)用法詳解

    Python數(shù)據(jù)處理利器Slice函數(shù)用法詳解

    這篇文章主要給大家介紹了關(guān)于Python數(shù)據(jù)處理利器Slice函數(shù)用法的相關(guān)資料,slice函數(shù)是Python中的一個內(nèi)置函數(shù),用于對序列進(jìn)行切片操作,文中通過代碼介紹的非常詳細(xì),需要的朋友可以參考下
    2024-03-03
  • 在Python的Django框架中更新數(shù)據(jù)庫數(shù)據(jù)的方法

    在Python的Django框架中更新數(shù)據(jù)庫數(shù)據(jù)的方法

    這篇文章主要介紹了在Python的Django框架中更新數(shù)據(jù)庫數(shù)據(jù),對此Django框架中提供了便利的插入和更新方法,需要的朋友可以參考下
    2015-07-07
  • python中os.remove()用法及注意事項(xiàng)

    python中os.remove()用法及注意事項(xiàng)

    在本篇內(nèi)容里小編給大家分享的是一篇關(guān)于python中os.remove()用法及注意事項(xiàng),有需要的朋友們可以跟著學(xué)習(xí)下。
    2021-01-01
  • Python?Pygame實(shí)現(xiàn)可控制的煙花游戲

    Python?Pygame實(shí)現(xiàn)可控制的煙花游戲

    大家好,本篇文章主要講的是Python?Pygame實(shí)現(xiàn)可控制的煙花游戲,感興趣的同學(xué)趕快來看一看吧,對你有幫助的話記得收藏一下
    2022-01-01

最新評論