python目標(biāo)檢測(cè)yolo1?yolo2?yolo3和SSD網(wǎng)絡(luò)結(jié)構(gòu)對(duì)比
……最近在學(xué)習(xí)yolo1、yolo2和yolo3,事實(shí)上它們和SSD網(wǎng)絡(luò)有一定的相似性,我準(zhǔn)備匯總一下,看看有什么差別
各個(gè)網(wǎng)絡(luò)的結(jié)構(gòu)圖與其實(shí)現(xiàn)代碼
1、yolo1

由圖可見,其進(jìn)行了二十多次卷積還有四次最大池化,其中3x3卷積用于提取特征,1x1卷積用于壓縮特征,最后將圖像壓縮到7x7xfilter的大小,相當(dāng)于將整個(gè)圖像劃分為7x7的網(wǎng)格,每個(gè)網(wǎng)格負(fù)責(zé)自己這一塊區(qū)域的目標(biāo)檢測(cè)。
整個(gè)網(wǎng)絡(luò)最后利用全連接層使其結(jié)果的size為(7x7x30),其中7x7代表的是7x7的網(wǎng)格,30前20個(gè)代表的是預(yù)測(cè)的種類,后10代表兩個(gè)預(yù)測(cè)框及其置信度(5x2)。
網(wǎng)絡(luò)部分代碼如下:
# relu的改進(jìn)版
def leak_relu(self,x, alpha=0.1):
return tf.maximum(alpha * x, x)
# 建立網(wǎng)絡(luò)部分
def _build_net(self):
x = tf.placeholder(tf.float32, [None, 448, 448, 3])
with tf.variable_scope('yolo'):
# _conv_layer(self, x, num_filters, filter_size, stride,scope)
with tf.variable_scope('conv_2'):
# (448,448,3)->(224,224,64)
net = self._conv_layer(x, 64, 7, 2,'conv_2')
# (224,224,64)->(112,112,64)
net = self._maxpool_layer(net, 2, 2)
with tf.variable_scope('conv_4'):
# (112,112,64)->(112,112,192)
net = self._conv_layer(net, 192, 3, 1,'conv_4')
# (112,112,192)->(56,56,192)
net = self._maxpool_layer(net, 2, 2)
with tf.variable_scope('conv_6'):
# (56,56,128)
net = self._conv_layer(net, 128, 1, 1,'conv_6')
with tf.variable_scope('conv_7'):
# (56,56,256)
net = self._conv_layer(net, 256, 3, 1,'conv_7')
with tf.variable_scope('conv_8'):
# (56,56,256)
net = self._conv_layer(net, 256, 1, 1,'conv_8')
with tf.variable_scope('conv_9'):
# (56,56,512)
net = self._conv_layer(net, 512, 3, 1,'conv_9')
# (28,28,512)
net = self._maxpool_layer(net, 2, 2)
with tf.variable_scope('conv_11'):
net = self._conv_layer(net, 256, 1, 1,'conv_11')
with tf.variable_scope('conv_12'):
net = self._conv_layer(net, 512, 3, 1,'conv_12')
with tf.variable_scope('conv_13'):
net = self._conv_layer(net, 256, 1, 1,'conv_13')
with tf.variable_scope('conv_14'):
net = self._conv_layer(net, 512, 3, 1,'conv_14')
with tf.variable_scope('conv_15'):
net = self._conv_layer(net, 256, 1, 1,'conv_15')
with tf.variable_scope('conv_16'):
net = self._conv_layer(net, 512, 3, 1,'conv_16')
with tf.variable_scope('conv_17'):
net = self._conv_layer(net, 256, 1, 1,'conv_17')
with tf.variable_scope('conv_18'):
net = self._conv_layer(net, 512, 3, 1,'conv_18')
with tf.variable_scope('conv_19'):
net = self._conv_layer(net, 512, 1, 1,'conv_19')
with tf.variable_scope('conv_20'):
net = self._conv_layer(net, 1024, 3, 1,'conv_20')
# (14,14,512)
net = self._maxpool_layer(net, 2, 2)
with tf.variable_scope('conv_22'):
net = self._conv_layer(net, 512, 1, 1,'conv_22')
with tf.variable_scope('conv_23'):
net = self._conv_layer(net, 1024, 3, 1,'conv_23')
with tf.variable_scope('conv_24'):
net = self._conv_layer(net, 512, 1, 1,'conv_24')
with tf.variable_scope('conv_25'):
net = self._conv_layer(net, 1024, 3, 1,'conv_25')
with tf.variable_scope('conv_26'):
net = self._conv_layer(net, 1024, 3, 1,'conv_26')
with tf.variable_scope('conv_28'):
# (7,7,1024)
net = self._conv_layer(net, 1024, 3, 2,'conv_28')
with tf.variable_scope('conv_29'):
net = self._conv_layer(net, 1024, 3, 1,'conv_29')
with tf.variable_scope('conv_30'):
net = self._conv_layer(net, 1024, 3, 1,'conv_30')
net = self._flatten(net)
# (7x7x512,512)
with tf.variable_scope('fc_33'):
net = self._fc_layer(net, 512, activation=self.leak_relu,scope='fc_33')
with tf.variable_scope('fc_34'):
net = self._fc_layer(net, 4096, activation=self.leak_relu,scope='fc_34')
with tf.variable_scope('fc_36'):
net = self._fc_layer(net, 7*7*30,scope='fc_36')
# 其返回了placeholder_x和(7,7,30)net
return net,x
# 生成卷積層
def _conv_layer(self, x, num_filters, filter_size, stride,scope):
# 生成卷積層的weights
in_channels = x.get_shape().as_list()[-1]
weight = tf.Variable(tf.truncated_normal([filter_size, filter_size,
in_channels, num_filters], stddev=0.1),name='weights')
# 生成卷積層的bias
bias = tf.Variable(tf.zeros([num_filters,]),name='biases')
# 計(jì)算要padding的量,
pad_size = filter_size // 2
pad_mat = np.array([[0, 0], [pad_size, pad_size], [pad_size, pad_size], [0, 0]])
x_pad = tf.pad(x, pad_mat)
# 卷積
conv = tf.nn.conv2d(x_pad, weight, strides=[1, stride, stride, 1], padding="VALID",name=scope)
# 經(jīng)過優(yōu)化后的relu
output = self.leak_relu(tf.nn.bias_add(conv, bias))
return output
def _fc_layer(self, x, num_out, activation=None,scope=None):
# 全連接層
num_in = x.get_shape().as_list()[-1]
weight = tf.Variable(tf.truncated_normal([num_in, num_out], stddev=0.1),name='weights')
bias = tf.Variable(tf.zeros([num_out,]),name='biases')
output = tf.nn.xw_plus_b(x, weight, bias,name=scope)
if activation:
output = activation(output)
return output
def _maxpool_layer(self, x, pool_size, stride):
# 最大池化
output = tf.nn.max_pool(x, [1, pool_size, pool_size, 1],
strides=[1, stride, stride, 1], padding="SAME")
return output
def _flatten(self, x):
"""flatten the x"""
tran_x = tf.transpose(x, [0, 3, 1, 2]) # channle first mode
nums = np.product(x.get_shape().as_list()[1:])
return tf.reshape(tran_x, [-1, nums])
預(yù)測(cè)結(jié)果如下:

可見預(yù)測(cè)結(jié)果較差。
2、yolo2

YOLOv2使用了一個(gè)新的分類網(wǎng)絡(luò)作為特征提取部分,網(wǎng)絡(luò)使用了較多的3 x 3卷積核,在每一次池化操作后把通道數(shù)翻倍。借鑒了network in network的思想,把1 x 1的卷積核置于3 x 3的卷積核之間,用來壓縮特征。使用batch normalization穩(wěn)定模型訓(xùn)練,加速收斂,正則化模型。
與此同時(shí),其保留了一個(gè)shortcut用于存儲(chǔ)之前的特征。
除去網(wǎng)絡(luò)結(jié)構(gòu)的優(yōu)化外,yolo2相比于yolo1加入了先驗(yàn)框部分,我們可以看到最后輸出的conv_dec的shape為(13,13,425),其中13x13是把整個(gè)圖分為13x13的網(wǎng)格用于預(yù)測(cè),425可以分解為(85x5),在85中,由于yolo2常用的是coco數(shù)據(jù)集,其中具有80個(gè)類,剩余的5指的是x、y、w、h和其置信度。x5的5中,意味著預(yù)測(cè)結(jié)果包含5個(gè)框,分別對(duì)應(yīng)5個(gè)先驗(yàn)框。
解碼部分代碼如下:
def decode(self,net):
self.anchor_size = tf.constant(self.anchor_size,tf.float32)
# net的shape為[batch,169,5,85]
net = tf.reshape(net, [-1, 13 * 13, self.num_anchors, self.num_class + 5])
# 85 里面 0、1為xy的偏移量,2、3是wh的偏移量,4是置信度,5->84是每個(gè)種類的概率
# 偏移量、置信度、類別
# 中心坐標(biāo)相對(duì)于該cell坐上角的偏移量,sigmoid函數(shù)歸一化到(0,1)
# [batch,169,5,2]
xy_offset = tf.nn.sigmoid(net[:, :, :, 0:2])
wh_offset = tf.exp(net[:, :, :, 2:4])
obj_probs = tf.nn.sigmoid(net[:, :, :, 4])
class_probs = tf.nn.softmax(net[:, :, :, 5:])
# 在feature map對(duì)應(yīng)坐標(biāo)生成anchors,13,13
height_index = tf.range(self.feature_map_size[0], dtype=tf.float32)
width_index = tf.range(self.feature_map_size[1], dtype=tf.float32)
x_cell, y_cell = tf.meshgrid(height_index, width_index)
x_cell = tf.reshape(x_cell, [1, -1, 1]) # 和上面[H*W,num_anchors,num_class+5]對(duì)應(yīng)
y_cell = tf.reshape(y_cell, [1, -1, 1])
# x_cell和y_cell是網(wǎng)格分割中心
# xy_offset是相對(duì)中心的偏移情況
bbox_x = (x_cell + xy_offset[:, :, :, 0]) / 13
bbox_y = (y_cell + xy_offset[:, :, :, 1]) / 13
bbox_w = (self.anchor_size[:, 0] * wh_offset[:, :, :, 0]) / 13
bbox_h = (self.anchor_size[:, 1] * wh_offset[:, :, :, 1]) / 13
bboxes = tf.stack([bbox_x - bbox_w / 2, bbox_y - bbox_h / 2, bbox_x + bbox_w / 2, bbox_y + bbox_h / 2],
axis=3)
return bboxes, obj_probs, class_probs
網(wǎng)絡(luò)部分代碼如下:
def conv2d(self,x,filters_num,filters_size,pad_size=0,stride=1,batch_normalize=True,activation=leaky_relu,use_bias=False,name='conv2d'):
# 是否進(jìn)行pad
if pad_size > 0:
x = tf.pad(x,[[0,0],[pad_size,pad_size],[pad_size,pad_size],[0,0]])
# pad后進(jìn)行卷積
out = tf.layers.conv2d(x,filters=filters_num,kernel_size=filters_size,strides=stride,padding='VALID',activation=None,use_bias=use_bias,name=name)
# BN應(yīng)該在卷積層conv和激活函數(shù)activation之間,
# (后面有BN層的conv就不用偏置bias,并激活函數(shù)activation在后)
# 如果需要標(biāo)準(zhǔn)化則進(jìn)行標(biāo)準(zhǔn)化
if batch_normalize:
out = tf.layers.batch_normalization(out,axis=-1,momentum=0.9,training=False,name=name+'_bn')
if activation:
out = activation(out)
return out
def maxpool(self,x, size=2, stride=2, name='maxpool'):
return tf.layers.max_pooling2d(x, pool_size=size, strides=stride,name=name)
def passthrough(self,x, stride):
# 變小變長(zhǎng)
return tf.space_to_depth(x, block_size=stride)
def darknet(self):
x = tf.placeholder(dtype=tf.float32,shape=[None,416,416,3])
# 416,416,3 -> 416,416,32
net = self.conv2d(x, filters_num=32, filters_size=3, pad_size=1,
name='conv1')
# 416,416,32 -> 208,208,32
net = self.maxpool(net, size=2, stride=2, name='pool1')
# 208,208,32 -> 208,208,64
net = self.conv2d(net, 64, 3, 1, name='conv2')
# 208,208,64 -> 104,104,64
net = self.maxpool(net, 2, 2, name='pool2')
# 104,104,64 -> 104,104,128
net = self.conv2d(net, 128, 3, 1, name='conv3_1')
net = self.conv2d(net, 64, 1, 0, name='conv3_2')
net = self.conv2d(net, 128, 3, 1, name='conv3_3')
# 104,104,128 -> 52,52,128
net = self.maxpool(net, 2, 2, name='pool3')
net = self.conv2d(net, 256, 3, 1, name='conv4_1')
net = self.conv2d(net, 128, 1, 0, name='conv4_2')
net = self.conv2d(net, 256, 3, 1, name='conv4_3')
# 52,52,128 -> 26,26,256
net = self.maxpool(net, 2, 2, name='pool4')
# 26,26,256-> 26,26,512
net = self.conv2d(net, 512, 3, 1, name='conv5_1')
net = self.conv2d(net, 256, 1, 0, name='conv5_2')
net = self.conv2d(net, 512, 3, 1, name='conv5_3')
net = self.conv2d(net, 256, 1, 0, name='conv5_4')
net = self.conv2d(net, 512, 3, 1, name='conv5_5')
# 這一層特征圖,要進(jìn)行后面passthrough,保留一層特征層
shortcut = net
# 26,26,512-> 13,13,512
net = self.maxpool(net, 2, 2, name='pool5') #
# 13,13,512-> 13,13,1024
net = self.conv2d(net, 1024, 3, 1, name='conv6_1')
net = self.conv2d(net, 512, 1, 0, name='conv6_2')
net = self.conv2d(net, 1024, 3, 1, name='conv6_3')
net = self.conv2d(net, 512, 1, 0, name='conv6_4')
net = self.conv2d(net, 1024, 3, 1, name='conv6_5')
'''
訓(xùn)練檢測(cè)網(wǎng)絡(luò)時(shí)去掉了分類網(wǎng)絡(luò)的網(wǎng)絡(luò)最后一個(gè)卷積層,
在后面增加了三個(gè)卷積核尺寸為3 * 3,卷積核數(shù)量為1024的卷積層,
并在這三個(gè)卷積層的最后一層后面跟一個(gè)卷積核尺寸為1 * 1的卷積層,
卷積核數(shù)量是(B * (5 + C))。
對(duì)于VOC數(shù)據(jù)集,卷積層輸入圖像尺寸為416 * 416時(shí)最終輸出是13 * 13個(gè)柵格,
每個(gè)柵格預(yù)測(cè)5種boxes大小,每個(gè)box包含5個(gè)坐標(biāo)值和20個(gè)條件類別概率,
所以輸出維度是13 * 13 * 5 * (5 + 20)= 13 * 13 * 125。
檢測(cè)網(wǎng)絡(luò)加入了passthrough layer,
從最后一個(gè)輸出為26 * 26 * 512的卷積層連接到新加入的三個(gè)卷積核尺寸為3 * 3的卷積層的第二層,使模型有了細(xì)粒度特征。
'''
# 下面這部分主要是training for detection
net = self.conv2d(net, 1024, 3, 1, name='conv7_1')
# 13,13,1024-> 13,13,1024
net = self.conv2d(net, 1024, 3, 1, name='conv7_2')
# shortcut增加了一個(gè)中間卷積層,先采用64個(gè)1*1卷積核進(jìn)行卷積,然后再進(jìn)行passthrough處理
# 這樣26*26*512 -> 26*26*64 -> 13*13*256的特征圖
shortcut = self.conv2d(shortcut, 64, 1, 0, name='conv_shortcut')
shortcut = self.passthrough(shortcut, 2)
# 連接之后,變成13*13*(1024+256)
net = tf.concat([shortcut, net],axis=-1)
# channel整合到一起,concatenated with the original features,passthrough層與ResNet網(wǎng)絡(luò)的shortcut類似,以前面更高分辨率的特征圖為輸入,然后將其連接到后面的低分辨率特征圖上,
net = self.conv2d(net, 1024, 3, 1, name='conv8')
# detection layer: 最后用一個(gè)1*1卷積去調(diào)整channel,該層沒有BN層和激活函數(shù),變成: S*S*(B*(5+C)),在這里為:13*13*425
output = self.conv2d(net, filters_num=self.f_num, filters_size=1, batch_normalize=False, activation=None,
use_bias=True, name='conv_dec')
return output,x
預(yù)測(cè)結(jié)果如下:

相比于yolo1有很大提升。
3、yolo3

YOLOv3相比于之前的yolo1和yolo2,改進(jìn)較大,主要改進(jìn)方向有:
1、使用了殘差網(wǎng)絡(luò)Residual,殘差卷積就是進(jìn)行一次3X3的卷積,然后保存該卷積layer,再進(jìn)行一次1X1的卷積和一次3X3的卷積,并把這個(gè)結(jié)果加上layer作為最后的結(jié)果, 殘差網(wǎng)絡(luò)的特點(diǎn)是容易優(yōu)化,并且能夠通過增加相當(dāng)?shù)纳疃葋硖岣邷?zhǔn)確率。其內(nèi)部的殘差塊使用了跳躍連接,緩解了在深度神經(jīng)網(wǎng)絡(luò)中增加深度帶來的梯度消失問題。
2、提取多特征層進(jìn)行目標(biāo)檢測(cè),一共提取三個(gè)特征層,它的shape分別為(13,13,75),(26,26,75),(52,52,75)最后一個(gè)維度為75是因?yàn)樵搱D是基于voc數(shù)據(jù)集的,它的類為20種,yolo3只有針對(duì)每一個(gè)特征層存在3個(gè)先驗(yàn)框,所以最后維度為3x25。
3、其采用反卷積UmSampling2d設(shè)計(jì),逆卷積相對(duì)于卷積在神經(jīng)網(wǎng)絡(luò)結(jié)構(gòu)的正向和反向傳播中做相反的運(yùn)算,其可以更多更好的提取出特征。
# l2 正則化
def _batch_normalization_layer(self, input_layer, name = None, training = True, norm_decay = 0.99, norm_epsilon = 1e-3):
'''
Introduction
------------
對(duì)卷積層提取的feature map使用batch normalization
Parameters
----------
input_layer: 輸入的四維tensor
name: batchnorm層的名字
trainging: 是否為訓(xùn)練過程
norm_decay: 在預(yù)測(cè)時(shí)計(jì)算moving average時(shí)的衰減率
norm_epsilon: 方差加上極小的數(shù),防止除以0的情況
Returns
-------
bn_layer: batch normalization處理之后的feature map
'''
bn_layer = tf.layers.batch_normalization(inputs = input_layer,
momentum = norm_decay, epsilon = norm_epsilon, center = True,
scale = True, training = training, name = name)
return tf.nn.leaky_relu(bn_layer, alpha = 0.1)
# 這個(gè)就是用來進(jìn)行卷積的
def _conv2d_layer(self, inputs, filters_num, kernel_size, name, use_bias = False, strides = 1):
"""
Introduction
------------
使用tf.layers.conv2d減少權(quán)重和偏置矩陣初始化過程,以及卷積后加上偏置項(xiàng)的操作
經(jīng)過卷積之后需要進(jìn)行batch norm,最后使用leaky ReLU激活函數(shù)
根據(jù)卷積時(shí)的步長(zhǎng),如果卷積的步長(zhǎng)為2,則對(duì)圖像進(jìn)行降采樣
比如,輸入圖片的大小為416*416,卷積核大小為3,若stride為2時(shí),(416 - 3 + 2)/ 2 + 1, 計(jì)算結(jié)果為208,相當(dāng)于做了池化層處理
因此需要對(duì)stride大于1的時(shí)候,先進(jìn)行一個(gè)padding操作, 采用四周都padding一維代替'same'方式
Parameters
----------
inputs: 輸入變量
filters_num: 卷積核數(shù)量
strides: 卷積步長(zhǎng)
name: 卷積層名字
trainging: 是否為訓(xùn)練過程
use_bias: 是否使用偏置項(xiàng)
kernel_size: 卷積核大小
Returns
-------
conv: 卷積之后的feature map
"""
conv = tf.layers.conv2d(
inputs = inputs, filters = filters_num,
kernel_size = kernel_size, strides = [strides, strides], kernel_initializer = tf.glorot_uniform_initializer(),
padding = ('SAME' if strides == 1 else 'VALID'), kernel_regularizer = tf.contrib.layers.l2_regularizer(scale = 5e-4), use_bias = use_bias, name = name)
return conv
# 這個(gè)用來進(jìn)行殘差卷積的
# 殘差卷積就是進(jìn)行一次3X3的卷積,然后保存該卷積layer
# 再進(jìn)行一次1X1的卷積和一次3X3的卷積,并把這個(gè)結(jié)果加上layer作為最后的結(jié)果
def _Residual_block(self, inputs, filters_num, blocks_num, conv_index, training = True, norm_decay = 0.99, norm_epsilon = 1e-3):
"""
Introduction
------------
Darknet的殘差block,類似resnet的兩層卷積結(jié)構(gòu),分別采用1x1和3x3的卷積核,使用1x1是為了減少channel的維度
Parameters
----------
inputs: 輸入變量
filters_num: 卷積核數(shù)量
trainging: 是否為訓(xùn)練過程
blocks_num: block的數(shù)量
conv_index: 為了方便加載預(yù)訓(xùn)練權(quán)重,統(tǒng)一命名序號(hào)
weights_dict: 加載預(yù)訓(xùn)練模型的權(quán)重
norm_decay: 在預(yù)測(cè)時(shí)計(jì)算moving average時(shí)的衰減率
norm_epsilon: 方差加上極小的數(shù),防止除以0的情況
Returns
-------
inputs: 經(jīng)過殘差網(wǎng)絡(luò)處理后的結(jié)果
"""
# 在輸入feature map的長(zhǎng)寬維度進(jìn)行padding
inputs = tf.pad(inputs, paddings=[[0, 0], [1, 0], [1, 0], [0, 0]], mode='CONSTANT')
layer = self._conv2d_layer(inputs, filters_num, kernel_size = 3, strides = 2, name = "conv2d_" + str(conv_index))
layer = self._batch_normalization_layer(layer, name = "batch_normalization_" + str(conv_index), training = training, norm_decay = norm_decay, norm_epsilon = norm_epsilon)
conv_index += 1
for _ in range(blocks_num):
shortcut = layer
layer = self._conv2d_layer(layer, filters_num // 2, kernel_size = 1, strides = 1, name = "conv2d_" + str(conv_index))
layer = self._batch_normalization_layer(layer, name = "batch_normalization_" + str(conv_index), training = training, norm_decay = norm_decay, norm_epsilon = norm_epsilon)
conv_index += 1
layer = self._conv2d_layer(layer, filters_num, kernel_size = 3, strides = 1, name = "conv2d_" + str(conv_index))
layer = self._batch_normalization_layer(layer, name = "batch_normalization_" + str(conv_index), training = training, norm_decay = norm_decay, norm_epsilon = norm_epsilon)
conv_index += 1
layer += shortcut
return layer, conv_index
#---------------------------------------#
# 生成_darknet53和逆卷積層
#---------------------------------------#
def _darknet53(self, inputs, conv_index, training = True, norm_decay = 0.99, norm_epsilon = 1e-3):
"""
Introduction
------------
構(gòu)建yolo3使用的darknet53網(wǎng)絡(luò)結(jié)構(gòu)
Parameters
----------
inputs: 模型輸入變量
conv_index: 卷積層數(shù)序號(hào),方便根據(jù)名字加載預(yù)訓(xùn)練權(quán)重
weights_dict: 預(yù)訓(xùn)練權(quán)重
training: 是否為訓(xùn)練
norm_decay: 在預(yù)測(cè)時(shí)計(jì)算moving average時(shí)的衰減率
norm_epsilon: 方差加上極小的數(shù),防止除以0的情況
Returns
-------
conv: 經(jīng)過52層卷積計(jì)算之后的結(jié)果, 輸入圖片為416x416x3,則此時(shí)輸出的結(jié)果shape為13x13x1024
route1: 返回第26層卷積計(jì)算結(jié)果52x52x256, 供后續(xù)使用
route2: 返回第43層卷積計(jì)算結(jié)果26x26x512, 供后續(xù)使用
conv_index: 卷積層計(jì)數(shù),方便在加載預(yù)訓(xùn)練模型時(shí)使用
"""
with tf.variable_scope('darknet53'):
# 416,416,3 -> 416,416,32
conv = self._conv2d_layer(inputs, filters_num = 32, kernel_size = 3, strides = 1, name = "conv2d_" + str(conv_index))
conv = self._batch_normalization_layer(conv, name = "batch_normalization_" + str(conv_index), training = training, norm_decay = norm_decay, norm_epsilon = norm_epsilon)
conv_index += 1
# 416,416,32 -> 208,208,64
conv, conv_index = self._Residual_block(conv, conv_index = conv_index, filters_num = 64, blocks_num = 1, training = training, norm_decay = norm_decay, norm_epsilon = norm_epsilon)
# 208,208,64 -> 104,104,128
conv, conv_index = self._Residual_block(conv, conv_index = conv_index, filters_num = 128, blocks_num = 2, training = training, norm_decay = norm_decay, norm_epsilon = norm_epsilon)
# 104,104,128 -> 52,52,256
conv, conv_index = self._Residual_block(conv, conv_index = conv_index, filters_num = 256, blocks_num = 8, training = training, norm_decay = norm_decay, norm_epsilon = norm_epsilon)
# route1 = 52,52,256
route1 = conv
# 52,52,256 -> 26,26,512
conv, conv_index = self._Residual_block(conv, conv_index = conv_index, filters_num = 512, blocks_num = 8, training = training, norm_decay = norm_decay, norm_epsilon = norm_epsilon)
# route2 = 26,26,512
route2 = conv
# 26,26,512 -> 13,13,1024
conv, conv_index = self._Residual_block(conv, conv_index = conv_index, filters_num = 1024, blocks_num = 4, training = training, norm_decay = norm_decay, norm_epsilon = norm_epsilon)
# route3 = 13,13,1024
return route1, route2, conv, conv_index
# 輸出兩個(gè)網(wǎng)絡(luò)結(jié)果
# 第一個(gè)是進(jìn)行5次卷積后,用于下一次逆卷積的,卷積過程是1X1,3X3,1X1,3X3,1X1
# 第二個(gè)是進(jìn)行5+2次卷積,作為一個(gè)特征層的,卷積過程是1X1,3X3,1X1,3X3,1X1,3X3,1X1
def _yolo_block(self, inputs, filters_num, out_filters, conv_index, training = True, norm_decay = 0.99, norm_epsilon = 1e-3):
"""
Introduction
------------
yolo3在Darknet53提取的特征層基礎(chǔ)上,又加了針對(duì)3種不同比例的feature map的block,這樣來提高對(duì)小物體的檢測(cè)率
Parameters
----------
inputs: 輸入特征
filters_num: 卷積核數(shù)量
out_filters: 最后輸出層的卷積核數(shù)量
conv_index: 卷積層數(shù)序號(hào),方便根據(jù)名字加載預(yù)訓(xùn)練權(quán)重
training: 是否為訓(xùn)練
norm_decay: 在預(yù)測(cè)時(shí)計(jì)算moving average時(shí)的衰減率
norm_epsilon: 方差加上極小的數(shù),防止除以0的情況
Returns
-------
route: 返回最后一層卷積的前一層結(jié)果
conv: 返回最后一層卷積的結(jié)果
conv_index: conv層計(jì)數(shù)
"""
conv = self._conv2d_layer(inputs, filters_num = filters_num, kernel_size = 1, strides = 1, name = "conv2d_" + str(conv_index))
conv = self._batch_normalization_layer(conv, name = "batch_normalization_" + str(conv_index), training = training, norm_decay = norm_decay, norm_epsilon = norm_epsilon)
conv_index += 1
conv = self._conv2d_layer(conv, filters_num = filters_num * 2, kernel_size = 3, strides = 1, name = "conv2d_" + str(conv_index))
conv = self._batch_normalization_layer(conv, name = "batch_normalization_" + str(conv_index), training = training, norm_decay = norm_decay, norm_epsilon = norm_epsilon)
conv_index += 1
conv = self._conv2d_layer(conv, filters_num = filters_num, kernel_size = 1, strides = 1, name = "conv2d_" + str(conv_index))
conv = self._batch_normalization_layer(conv, name = "batch_normalization_" + str(conv_index), training = training, norm_decay = norm_decay, norm_epsilon = norm_epsilon)
conv_index += 1
conv = self._conv2d_layer(conv, filters_num = filters_num * 2, kernel_size = 3, strides = 1, name = "conv2d_" + str(conv_index))
conv = self._batch_normalization_layer(conv, name = "batch_normalization_" + str(conv_index), training = training, norm_decay = norm_decay, norm_epsilon = norm_epsilon)
conv_index += 1
conv = self._conv2d_layer(conv, filters_num = filters_num, kernel_size = 1, strides = 1, name = "conv2d_" + str(conv_index))
conv = self._batch_normalization_layer(conv, name = "batch_normalization_" + str(conv_index), training = training, norm_decay = norm_decay, norm_epsilon = norm_epsilon)
conv_index += 1
route = conv
conv = self._conv2d_layer(conv, filters_num = filters_num * 2, kernel_size = 3, strides = 1, name = "conv2d_" + str(conv_index))
conv = self._batch_normalization_layer(conv, name = "batch_normalization_" + str(conv_index), training = training, norm_decay = norm_decay, norm_epsilon = norm_epsilon)
conv_index += 1
conv = self._conv2d_layer(conv, filters_num = out_filters, kernel_size = 1, strides = 1, name = "conv2d_" + str(conv_index), use_bias = True)
conv_index += 1
return route, conv, conv_index
# 返回三個(gè)特征層的內(nèi)容
def yolo_inference(self, inputs, num_anchors, num_classes, training = True):
"""
Introduction
------------
構(gòu)建yolo模型結(jié)構(gòu)
Parameters
----------
inputs: 模型的輸入變量
num_anchors: 每個(gè)grid cell負(fù)責(zé)檢測(cè)的anchor數(shù)量
num_classes: 類別數(shù)量
training: 是否為訓(xùn)練模式
"""
conv_index = 1
# route1 = 52,52,256、route2 = 26,26,512、route3 = 13,13,1024
conv2d_26, conv2d_43, conv, conv_index = self._darknet53(inputs, conv_index, training = training, norm_decay = self.norm_decay, norm_epsilon = self.norm_epsilon)
with tf.variable_scope('yolo'):
#--------------------------------------#
# 獲得第一個(gè)特征層
#--------------------------------------#
# conv2d_57 = 13,13,512,conv2d_59 = 13,13,255(3x(80+5))
conv2d_57, conv2d_59, conv_index = self._yolo_block(conv, 512, num_anchors * (num_classes + 5), conv_index = conv_index, training = training, norm_decay = self.norm_decay, norm_epsilon = self.norm_epsilon)
#--------------------------------------#
# 獲得第二個(gè)特征層
#--------------------------------------#
conv2d_60 = self._conv2d_layer(conv2d_57, filters_num = 256, kernel_size = 1, strides = 1, name = "conv2d_" + str(conv_index))
conv2d_60 = self._batch_normalization_layer(conv2d_60, name = "batch_normalization_" + str(conv_index),training = training, norm_decay = self.norm_decay, norm_epsilon = self.norm_epsilon)
conv_index += 1
# unSample_0 = 26,26,256
unSample_0 = tf.image.resize_nearest_neighbor(conv2d_60, [2 * tf.shape(conv2d_60)[1], 2 * tf.shape(conv2d_60)[1]], name='upSample_0')
# route0 = 26,26,768
route0 = tf.concat([unSample_0, conv2d_43], axis = -1, name = 'route_0')
# conv2d_65 = 52,52,256,conv2d_67 = 26,26,255
conv2d_65, conv2d_67, conv_index = self._yolo_block(route0, 256, num_anchors * (num_classes + 5), conv_index = conv_index, training = training, norm_decay = self.norm_decay, norm_epsilon = self.norm_epsilon)
#--------------------------------------#
# 獲得第三個(gè)特征層
#--------------------------------------#
conv2d_68 = self._conv2d_layer(conv2d_65, filters_num = 128, kernel_size = 1, strides = 1, name = "conv2d_" + str(conv_index))
conv2d_68 = self._batch_normalization_layer(conv2d_68, name = "batch_normalization_" + str(conv_index), training=training, norm_decay=self.norm_decay, norm_epsilon = self.norm_epsilon)
conv_index += 1
# unSample_1 = 52,52,128
unSample_1 = tf.image.resize_nearest_neighbor(conv2d_68, [2 * tf.shape(conv2d_68)[1], 2 * tf.shape(conv2d_68)[1]], name='upSample_1')
# route1= 52,52,384
route1 = tf.concat([unSample_1, conv2d_26], axis = -1, name = 'route_1')
# conv2d_75 = 52,52,255
_, conv2d_75, _ = self._yolo_block(route1, 128, num_anchors * (num_classes + 5), conv_index = conv_index, training = training, norm_decay = self.norm_decay, norm_epsilon = self.norm_epsilon)
return [conv2d_59, conv2d_67, conv2d_75]
預(yù)測(cè)結(jié)果如下:

相比于yolo1、2都有很大提升。
4、SSD

對(duì)于ssd網(wǎng)絡(luò)我專門寫了兩篇blog用于描述其訓(xùn)練過程和預(yù)測(cè)過程
大家可以看一下SSD算法預(yù)測(cè)部分;SSD算法訓(xùn)練部分。
SSD其實(shí)也是一個(gè)多特征層網(wǎng)絡(luò),其一共具有11層,前半部分結(jié)構(gòu)是VGG16。
其網(wǎng)絡(luò)結(jié)構(gòu)如下:
1、首先通過了多個(gè)3X3卷積層、5次步長(zhǎng)為2的最大池化取出特征,形成了5個(gè)Block,其中第四個(gè)Block的shape為(?,38,38,512),該層用于提取小目標(biāo)(多次卷積后大目標(biāo)的特征保存的更好,小目標(biāo)特征會(huì)消失,需要在比較靠前的層提取小目標(biāo)特征)。2、進(jìn)行一次卷積核膨脹dilate(關(guān)于卷積核膨脹的概念可以去網(wǎng)上搜索以下哈)。
3、讀取第七個(gè)Block7的特征,shape為(?,19,19,1024)
4、分別利用1x1和3x3卷積提取特征,在3x3卷積的時(shí)候使用步長(zhǎng)2,縮小特征數(shù)。獲取第八個(gè)Block8的特、征,shape為(?,10,10,512)
5、重復(fù)步驟4,獲得9、10、11卷積層的特征,shape分別為(?,5,5,256)、(?,3,3,256)、(?,1,1,256)其網(wǎng)絡(luò)部分代碼為:
# =============================網(wǎng)絡(luò)部分============================= #
############################################################
# 該部分供SSDNet的net函數(shù)調(diào)用,用于建立網(wǎng)絡(luò) #
# 返回predictions, localisations, logits, end_points #
############################################################
def ssd_net(inputs,
num_classes=SSDNet.default_params.num_classes,
feat_layers=SSDNet.default_params.feat_layers,
anchor_sizes=SSDNet.default_params.anchor_sizes,
anchor_ratios=SSDNet.default_params.anchor_ratios,
normalizations=SSDNet.default_params.normalizations,
is_training=True,
dropout_keep_prob=0.5,
prediction_fn=slim.softmax,
reuse=None,
scope='ssd_300_vgg'):
"""SSD net definition.
"""
# 建立網(wǎng)絡(luò)
end_points = {}
with tf.variable_scope(scope, 'ssd_300_vgg', [inputs], reuse=reuse):
# Block1
'''
相當(dāng)于執(zhí)行:
net = self.conv2d(x,64,[3,3],scope = 'conv1_1')
net = self.conv2d(net,64,[3,3],scope = 'conv1_2')
'''
# (300,300,3) -> (300,300,64) -> (150,150,64)
net = slim.repeat(inputs, 2, slim.conv2d, 64, [3, 3], scope='conv1')
end_points['block1'] = net
net = slim.max_pool2d(net, [2, 2], scope='pool1')
# Block 2.
'''
相當(dāng)于執(zhí)行:
net = self.conv2d(net,128,[3,3],scope = 'conv2_1')
net = self.conv2d(net,128,[3,3],scope = 'conv2_2')
'''
# (150,150,64) -> (150,150,128) -> (75,75,128)
net = slim.repeat(net, 2, slim.conv2d, 128, [3, 3], scope='conv2')
end_points['block2'] = net
net = slim.max_pool2d(net, [2, 2], scope='pool2')
# Block 3.
'''
相當(dāng)于執(zhí)行:
net = self.conv2d(net,256,[3,3],scope = 'conv3_1')
net = self.conv2d(net,256,[3,3],scope = 'conv3_2')
net = self.conv2d(net,256,[3,3],scope = 'conv3_3')
'''
# (75,75,128) -> (75,75,256) -> (38,38,256)
net = slim.repeat(net, 3, slim.conv2d, 256, [3, 3], scope='conv3')
end_points['block3'] = net
net = slim.max_pool2d(net, [2, 2],stride = 2,padding = "SAME", scope='pool3')
# Block 4.
# 三次卷積
# (38,38,256) -> (38,38,512) -> block4_net -> (19,19,512)
net = slim.repeat(net, 3, slim.conv2d, 512, [3, 3], scope='conv4')
end_points['block4'] = net
net = slim.max_pool2d(net, [2, 2],padding = "SAME", scope='pool4')
# Block 5.
# 三次卷積
# (19,19,512)->(19,19,512)
net = slim.repeat(net, 3, slim.conv2d, 512, [3, 3], scope='conv5')
end_points['block5'] = net
net = slim.max_pool2d(net, [3, 3], stride=1,padding = "SAME", scope='pool5')
# Block 6: dilate
# 卷積核膨脹
# (19,19,512)->(19,19,1024)
net = slim.conv2d(net, 1024, [3, 3], rate=6, scope='conv6')
end_points['block6'] = net
net = tf.layers.dropout(net, rate=dropout_keep_prob, training=is_training)
# Block 7: 1x1 conv
# (19,19,1024)->(19,19,1024)
net = slim.conv2d(net, 1024, [1, 1], scope='conv7')
end_points['block7'] = net
net = tf.layers.dropout(net, rate=dropout_keep_prob, training=is_training)
# Block 8/9/10/11: 1x1 and 3x3 convolutions stride 2 (except lasts).
# (19,19,1024)->(19,19,256)->(10,10,512)
end_point = 'block8'
with tf.variable_scope(end_point):
net = slim.conv2d(net, 256, [1, 1], scope='conv1x1')
net = custom_layers.pad2d(net, pad=(1, 1))
net = slim.conv2d(net, 512, [3, 3], stride=2, scope='conv3x3', padding='VALID')
end_points[end_point] = net
end_point = 'block9'
# (10,10,512)->(10,10,128)->(5,5,256)
with tf.variable_scope(end_point):
net = slim.conv2d(net, 128, [1, 1], scope='conv1x1')
net = custom_layers.pad2d(net, pad=(1, 1))
net = slim.conv2d(net, 256, [3, 3], stride=2, scope='conv3x3', padding='VALID')
end_points[end_point] = net
end_point = 'block10'
# (5,5,256)->(5,5,128)->(3,3,256)
with tf.variable_scope(end_point):
net = slim.conv2d(net, 128, [1, 1], scope='conv1x1')
net = slim.conv2d(net, 256, [3, 3], scope='conv3x3', padding='VALID')
end_points[end_point] = net
end_point = 'block11'
# (3,3,256)->(1,1,256)
with tf.variable_scope(end_point):
net = slim.conv2d(net, 128, [1, 1], scope='conv1x1')
net = slim.conv2d(net, 256, [3, 3], scope='conv3x3', padding='VALID')
end_points[end_point] = net
# 預(yù)測(cè)和定位層
predictions = []
logits = []
localisations = []
for i, layer in enumerate(feat_layers):
with tf.variable_scope(layer + '_box'):
p, l = ssd_multibox_layer(end_points[layer],
num_classes,
anchor_sizes[i],
anchor_ratios[i],
normalizations[i])
predictions.append(prediction_fn(p))
logits.append(p)
localisations.append(l)
return predictions, localisations, logits, end_points
ssd_net.default_image_size = 300
預(yù)測(cè)結(jié)果為:

ssd的預(yù)測(cè)效果也還可以。
總結(jié)
隨著yolo123版本的更新,預(yù)測(cè)效果越來越好,但是預(yù)測(cè)速度也不斷在下降,yolo3的速度還是比較快的,官網(wǎng)在推出yolo3后直接下了yolo2和yolo1,可以看出來很自信……其優(yōu)秀的檢測(cè)結(jié)果主要的得益于殘差網(wǎng)絡(luò)、反卷積和多特征層的思想,這些特點(diǎn)使其可以很好的提取特征,同時(shí)訓(xùn)練效果好,且對(duì)大目標(biāo)和小目標(biāo)都有很好的檢測(cè)效果。
SSD同樣采用多特征層的思想,但是其網(wǎng)絡(luò)結(jié)構(gòu)相比于yolo3更加簡(jiǎn)單,其利用VGG16進(jìn)行特征提取,同樣具有比較優(yōu)秀的效果,更多關(guān)于yolo1 yolo2 yolo3與SSD網(wǎng)絡(luò)結(jié)構(gòu)的資料請(qǐng)關(guān)注腳本之家其它相關(guān)文章!
相關(guān)文章
利用python實(shí)現(xiàn)詞頻統(tǒng)計(jì)分析的代碼示例
詞頻統(tǒng)計(jì)是指在文本或語(yǔ)音數(shù)據(jù)中,統(tǒng)計(jì)每個(gè)單詞或符號(hào)出現(xiàn)的次數(shù),以便對(duì)文本或語(yǔ)音數(shù)據(jù)進(jìn),這篇文章將詳細(xì)介紹分詞后如何進(jìn)行詞頻統(tǒng)計(jì)分析2023-06-06
python ddt數(shù)據(jù)驅(qū)動(dòng)最簡(jiǎn)實(shí)例代碼
在本篇內(nèi)容里我們給大家分享了關(guān)于python ddt數(shù)據(jù)驅(qū)動(dòng)最簡(jiǎn)實(shí)例代碼以及相關(guān)知識(shí)點(diǎn),需要的朋友們跟著學(xué)習(xí)下。2019-02-02
Python?xpath,JsonPath,bs4的基本使用
這篇文章主要介紹了Python?xpath,JsonPath,bs4的基本使用,文章圍繞主題展開詳細(xì)的內(nèi)容介紹,具有一定的參考價(jià)值,感興趣的小伙伴可以參考一下2022-07-07
Python爬蟲庫(kù)requests-html進(jìn)行HTTP請(qǐng)求HTML解析等高級(jí)功能應(yīng)用
這篇文章主要為大家介紹了Python爬蟲庫(kù)requests-html進(jìn)行HTTP請(qǐng)求HTML解析JavaScript渲染以及更高級(jí)的功能應(yīng)用示例詳解,有需要的朋友可以借鑒參考下,希望能夠有所幫助,祝大家多多進(jìn)步,早日升職加薪2023-12-12
Python使用unittest進(jìn)行有效測(cè)試的示例詳解
這篇文章主要介紹了如何使用?unittest?來編寫和運(yùn)行單元測(cè)試,希望通過閱讀本文,大家能了解?unittest?的基本使用方法,以及如何使用?unittest?中的斷言方法和測(cè)試用例組織結(jié)構(gòu)2023-06-06
Flask框架學(xué)習(xí)筆記之路由和反向路由詳解【圖文與實(shí)例】
這篇文章主要介紹了Flask框架學(xué)習(xí)筆記之路由和反向路由,結(jié)合圖文與實(shí)例形式詳細(xì)分析了flask框架中路由與反向路由相關(guān)概念、原理、用法與相關(guān)操作注意事項(xiàng),需要的朋友可以參考下2019-08-08
python csv實(shí)時(shí)一條一條插入且表頭不重復(fù)問題
這篇文章主要介紹了python csv實(shí)時(shí)一條一條插入且表頭不重復(fù)問題,具有很好的參考價(jià)值,希望對(duì)大家有所幫助。如有錯(cuò)誤或未考慮完全的地方,望不吝賜教2022-05-05
對(duì)python打亂數(shù)據(jù)集中X,y標(biāo)簽對(duì)的方法詳解
今天就為大家分享一篇對(duì)python打亂數(shù)據(jù)集中X,y標(biāo)簽對(duì)的方法詳解,具有很好的參考價(jià)值,希望對(duì)大家有所幫助。一起跟隨小編過來看看吧2018-12-12
Django多進(jìn)程滾動(dòng)日志問題解決方案
這篇文章主要介紹了Django多進(jìn)程滾動(dòng)日志問題解決方案,文中通過示例代碼介紹的非常詳細(xì),對(duì)大家的學(xué)習(xí)或者工作具有一定的參考學(xué)習(xí)價(jià)值,需要的朋友可以參考下2019-12-12

