pytorch關(guān)于卷積操作的初始化方式(kaiming_uniform_詳解)
摘要:
最近寫(xiě)了一篇文章,reviewers給了幾個(gè)意見(jiàn),其中之一就是:不同配置下的網(wǎng)絡(luò)初始化條件是否相同,是怎樣初始化的?
之前竟然沒(méi)有關(guān)注過(guò)這個(gè)問(wèn)題,應(yīng)該是torch默認(rèn)情況下會(huì)初始化卷積核參數(shù),這里詳細(xì)講解一下torch卷積操作的初始化過(guò)程。
1. pytorch中的卷積運(yùn)算分類
在pycharm的IDE中,按住ctrl+鼠標(biāo)點(diǎn)擊torch.nn.Conv2d可以進(jìn)入torch的內(nèi)部卷積運(yùn)算的源碼(conv.py)
搭建網(wǎng)絡(luò)經(jīng)常使用到的模塊
如下圖所示:
class _ConvNd(Module): class Conv1d(_ConvNd): class Conv2d(_ConvNd): class Conv3d(_ConvNd): class _ConvTransposeNd(_ConvNd): class ConvTranspose1d(_ConvTransposeNd): class ConvTranspose2d(_ConvTransposeNd): class ConvTranspose3d(_ConvTransposeNd):

可以看到:常用的卷積的父類均是
class _ConvNd(Module):
并且點(diǎn)開(kāi) class Conv2d(_ConvNd): 并沒(méi)有發(fā)現(xiàn)參數(shù)初始化的具體方法,
如下圖所示

所以猜想卷積初始化參數(shù)的方法應(yīng)該在父類 _ConvNd(Module):
2. pytorch中的卷積操作的父類
下面是父類 _ConvNd 的源碼,其中初始化參數(shù)的 方法是
def reset_parameters(self) -> None:
class _ConvNd(Module):
__constants__ = ['stride', 'padding', 'dilation', 'groups',
'padding_mode', 'output_padding', 'in_channels',
'out_channels', 'kernel_size']
__annotations__ = {'bias': Optional[torch.Tensor]}
def _conv_forward(self, input: Tensor, weight: Tensor, bias: Optional[Tensor]) -> Tensor:
...
_in_channels: int
out_channels: int
kernel_size: Tuple[int, ...]
stride: Tuple[int, ...]
padding: Tuple[int, ...]
dilation: Tuple[int, ...]
transposed: bool
output_padding: Tuple[int, ...]
groups: int
padding_mode: str
weight: Tensor
bias: Optional[Tensor]
def __init__(self,
in_channels: int,
out_channels: int,
kernel_size: Tuple[int, ...],
stride: Tuple[int, ...],
padding: Tuple[int, ...],
dilation: Tuple[int, ...],
transposed: bool,
output_padding: Tuple[int, ...],
groups: int,
bias: bool,
padding_mode: str) -> None:
super(_ConvNd, self).__init__()
if in_channels % groups != 0:
raise ValueError('in_channels must be divisible by groups')
if out_channels % groups != 0:
raise ValueError('out_channels must be divisible by groups')
valid_padding_modes = {'zeros', 'reflect', 'replicate', 'circular'}
if padding_mode not in valid_padding_modes:
raise ValueError("padding_mode must be one of {}, but got padding_mode='{}'".format(
valid_padding_modes, padding_mode))
self.in_channels = in_channels
self.out_channels = out_channels
self.kernel_size = kernel_size
self.stride = stride
self.padding = padding
self.dilation = dilation
self.transposed = transposed
self.output_padding = output_padding
self.groups = groups
self.padding_mode = padding_mode
# `_reversed_padding_repeated_twice` is the padding to be passed to
# `F.pad` if needed (e.g., for non-zero padding types that are
# implemented as two ops: padding + conv). `F.pad` accepts paddings in
# reverse order than the dimension.
self._reversed_padding_repeated_twice = _reverse_repeat_tuple(self.padding, 2)
if transposed:
self.weight = Parameter(torch.Tensor(
in_channels, out_channels // groups, *kernel_size))
else:
self.weight = Parameter(torch.Tensor(
out_channels, in_channels // groups, *kernel_size))
if bias:
self.bias = Parameter(torch.Tensor(out_channels))
else:
self.register_parameter('bias', None)
self.reset_parameters()
def reset_parameters(self) -> None:
init.kaiming_uniform_(self.weight, a=math.sqrt(5))
if self.bias is not None:
fan_in, _ = init._calculate_fan_in_and_fan_out(self.weight)
bound = 1 / math.sqrt(fan_in)
init.uniform_(self.bias, -bound, bound)
def extra_repr(self):
s = ('{in_channels}, {out_channels}, kernel_size={kernel_size}'
', stride={stride}')
if self.padding != (0,) * len(self.padding):
s += ', padding={padding}'
if self.dilation != (1,) * len(self.dilation):
s += ', dilation={dilation}'
if self.output_padding != (0,) * len(self.output_padding):
s += ', output_padding={output_padding}'
if self.groups != 1:
s += ', groups={groups}'
if self.bias is None:
s += ', bias=False'
if self.padding_mode != 'zeros':
s += ', padding_mode={padding_mode}'
return s.format(**self.__dict__)
def __setstate__(self, state):
super(_ConvNd, self).__setstate__(state)
if not hasattr(self, 'padding_mode'):
self.padding_mode = 'zeros'3. def reset_parameters(self) -> None
卷積操作的默認(rèn)的初始化方式:
def reset_parameters(self) -> None:
init.kaiming_uniform_(self.weight, a=math.sqrt(5))
if self.bias is not None:
fan_in, _ = init._calculate_fan_in_and_fan_out(self.weight)
bound = 1 / math.sqrt(fan_in)
init.uniform_(self.bias, -bound, bound)該類中的參數(shù)的初始化方式是: Kaiming 初始化
由我國(guó)計(jì)算機(jī)視覺(jué)領(lǐng)域?qū)<液蝿P明提出了針對(duì)于relu的初始化方法,pytorch默認(rèn)使用kaiming正態(tài)分布初始化卷積層參數(shù)。
Fills the input Tensor with values according to the method described in Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification - He, K. et al. (2015),
using a uniform distribution.
The resulting tensor will have values sampled from U(?−?bound,?bound) where bound?=?gain?×?√((3)/(?fan_mode))Also known as He initialization.
3.1 卷積核部分的參數(shù)初始化:
init.kaiming_uniform_(self.weight, a=math.sqrt(5))
關(guān)于init.kaiming_uniform_這個(gè)函數(shù),源碼如下:
def kaiming_uniform_(tensor, a=0, mode='fan_in', nonlinearity='leaky_relu'):
r"""Fills the input `Tensor` with values according to the method
described in `Delving deep into rectifiers: Surpassing human-level
performance on ImageNet classification` - He, K. et al. (2015), using a
uniform distribution. The resulting tensor will have values sampled from
:math:`\mathcal{U}(-\text{bound}, \text{bound})` where
.. math::
\text{bound} = \text{gain} \times \sqrt{\frac{3}{\text{fan\_mode}}}
Also known as He initialization.
Args:
tensor: an n-dimensional `torch.Tensor`
a: the negative slope of the rectifier used after this layer (only
used with ``'leaky_relu'``)
mode: either ``'fan_in'`` (default) or ``'fan_out'``. Choosing ``'fan_in'``
preserves the magnitude of the variance of the weights in the
forward pass. Choosing ``'fan_out'`` preserves the magnitudes in the
backwards pass.
nonlinearity: the non-linear function (`nn.functional` name),
recommended to use only with ``'relu'`` or ``'leaky_relu'`` (default).
Examples:
>>> w = torch.empty(3, 5)
>>> nn.init.kaiming_uniform_(w, mode='fan_in', nonlinearity='relu')
"""
fan = _calculate_correct_fan(tensor, mode)
gain = calculate_gain(nonlinearity, a)
std = gain / math.sqrt(fan)
bound = math.sqrt(3.0) * std # Calculate uniform bounds from standard deviation
with torch.no_grad():
return tensor.uniform_(-bound, bound)torch中卷積核默認(rèn)的初始化的詳細(xì)參數(shù)為:
?init.kaiming_uniform_(self.weight, a=math.sqrt(5),mode='fan_in', nonlinearity='leaky_relu'))
關(guān)于 init.kaiming_uniform_中所使用的其他函數(shù) ,如下不做進(jìn)一步的分析,不過(guò)還是簡(jiǎn)單介紹一下。
_calculate_correct_fan(tensor, mode) # 用于計(jì)算計(jì)算當(dāng)前網(wǎng)絡(luò)層的fan_in(輸入神經(jīng)元個(gè)數(shù))或 fan_out(輸出神經(jīng)元個(gè)數(shù)的),取決于 mode 的值 'fan_in' 'fan_out' calculate_gain:# 對(duì)于給定的非線性函數(shù),返回推薦的增益值,其實(shí)就是一個(gè)數(shù),從下面圖中的列表中選出對(duì)應(yīng)的值

- _calculate_correct_fan:在這里 model = fan_in, 計(jì)算 的是 當(dāng)前網(wǎng)絡(luò)層的fan_in(輸入神經(jīng)元個(gè)數(shù))
- calculate_gain: 在這里 nonlinearity='leaky_relu',param = a = math.sqrt(5) 得到的值就是:(negative_slope = param = math.sqrt(5))
gan = math.sqrt(2.0 / (1 + negative_slope ** 2))
前文講到,
The resulting tensor will have values sampled from U(?−?bound,?bound) where bound?=?gain?×?√((3)/(?fan_mode))
所以上面的一通計(jì)算得到了bound
下面的 uniform_(from=0, to=1) → Tensor, 將tensor用從均勻分布中抽樣得到的值填充。
3.2 bias部分的初始化
這里不做詳細(xì)介紹了,相信認(rèn)真看了 weights部分的初始化過(guò)程,這部分自然會(huì)明白。
if self.bias is not None:
fan_in, _ = init._calculate_fan_in_and_fan_out(self.weight)
bound = 1 / math.sqrt(fan_in)
init.uniform_(self.bias, -bound, bound)附加的:
init._calculate_fan_in_and_fan_out(self.weight)?
函數(shù)來(lái)計(jì)算當(dāng)前網(wǎng)絡(luò)層的fan_in(輸入神經(jīng)元個(gè)數(shù))和fan_out(輸出神經(jīng)元個(gè)數(shù)的)
總結(jié)
以上為個(gè)人經(jīng)驗(yàn),希望能給大家一個(gè)參考,也希望大家多多支持腳本之家。
相關(guān)文章
jupyter notebook指定啟動(dòng)目錄的方法
這篇文章主要介紹了jupyter notebook指定啟動(dòng)目錄的方法,文中通過(guò)示例代碼介紹的非常詳細(xì),對(duì)大家的學(xué)習(xí)或者工作具有一定的參考學(xué)習(xí)價(jià)值,需要的朋友們下面隨著小編來(lái)一起學(xué)習(xí)學(xué)習(xí)吧2021-03-03
django連接數(shù)據(jù)庫(kù)獲取數(shù)據(jù)的簡(jiǎn)單步驟記錄
數(shù)據(jù)庫(kù)中各種表結(jié)構(gòu)已經(jīng)創(chuàng)建好了,甚至連數(shù)據(jù)都有了,此時(shí)我要用Django管理這個(gè)數(shù)據(jù)庫(kù),下面這篇文章主要給大家介紹了關(guān)于django連接數(shù)據(jù)庫(kù)獲取數(shù)據(jù)的相關(guān)資料,需要的朋友可以參考下2022-07-07
Python WEB應(yīng)用部署的實(shí)現(xiàn)方法
這篇文章主要介紹了Python WEB應(yīng)用部署的實(shí)現(xiàn)方法,小編覺(jué)得挺不錯(cuò)的,現(xiàn)在分享給大家,也給大家做個(gè)參考。一起跟隨小編過(guò)來(lái)看看吧2019-01-01
Python入門_淺談數(shù)據(jù)結(jié)構(gòu)的4種基本類型
下面小編就為大家?guī)?lái)一篇Python入門_淺談數(shù)據(jù)結(jié)構(gòu)的4種基本類型。小編覺(jué)得挺不錯(cuò)的,現(xiàn)在就分享給大家,也給大家做個(gè)參考。一起跟隨小編過(guò)來(lái)看看吧2017-05-05
python實(shí)現(xiàn)發(fā)送form-data數(shù)據(jù)的方法詳解
這篇文章主要介紹了python實(shí)現(xiàn)發(fā)送form-data數(shù)據(jù)的方法,結(jié)合實(shí)例形式分析了Python發(fā)送form-data數(shù)據(jù)的相關(guān)操作步驟、實(shí)現(xiàn)方法與注意事項(xiàng),需要的朋友可以參考下2019-09-09
python中使用PIL制作并驗(yàn)證圖片驗(yàn)證碼
本篇文章給大家分享了python中使用PIL制作并驗(yàn)證圖片驗(yàn)證碼的具體代碼以及說(shuō)明,需要的朋友參考下吧。2018-03-03
Python中max函數(shù)用于二維列表的實(shí)例
下面小編就為大家分享一篇Python中max函數(shù)用于二維列表的實(shí)例,具有很好的參考價(jià)值,希望對(duì)大家有所幫助。一起跟隨小編過(guò)來(lái)看看吧2018-04-04

