python目標(biāo)檢測SSD算法預(yù)測部分源碼詳解

更新時間：2022年05月10日 18:29:24 作者：Bubbliiiing

這篇文章主要為大家介紹了python目標(biāo)檢測SSD算法預(yù)測部分源碼詳解，有需要的朋友可以借鑒參考下，希望能夠有所幫助，祝大家多多進(jìn)步，早日升職加薪

學(xué)習(xí)前言

……學(xué)習(xí)了很多有關(guān)目標(biāo)檢測的概念呀，咕嚕咕嚕，可是要怎么才能進(jìn)行預(yù)測呢，我看了好久的SSD源碼，將其中的預(yù)測部分提取了出來，訓(xùn)練部分我還沒看懂

什么是SSD算法

SSD是一種非常優(yōu)秀的one-stage方法，one-stage算法就是目標(biāo)檢測和分類是同時完成的，其主要思路是均勻地在圖片的不同位置進(jìn)行密集抽樣，抽樣時可以采用不同尺度和長寬比，然后利用CNN提取特征后直接進(jìn)行分類與回歸，整個過程只需要一步，所以其優(yōu)勢是速度快。

但是均勻的密集采樣的一個重要缺點(diǎn)是訓(xùn)練比較困難，這主要是因為正樣本與負(fù)樣本（背景）極其不均衡（參見Focal Loss），導(dǎo)致模型準(zhǔn)確度稍低。

SSD的英文全名是Single Shot MultiBox Detector，Single shot說明SSD算法屬于one-stage方法，MultiBox說明SSD算法基于多框預(yù)測。（它真的不是固態(tài)硬盤啊~~~~~~）

講解構(gòu)架

本次教程的講解分為倆個部分，第一部分是ssd_vgg_300主體的源碼的講解，第二部分是如何調(diào)用ssd_vgg_300主體的源碼，即利用源碼進(jìn)行預(yù)測。

ssd_vgg_300主體的源碼的講解包括如下三個部分：

1、網(wǎng)絡(luò)部分，用于建立ssd網(wǎng)絡(luò)，用于預(yù)測種類和框的位置。

2、先驗框部分，根據(jù)每個特征層的shape，構(gòu)建出合適比例的框，同時可以減少運(yùn)算量。

3、解碼部分，根據(jù)網(wǎng)絡(luò)部分和先驗框部分的輸出，對框的位置進(jìn)行解碼。

利用源碼進(jìn)行預(yù)測的講解包括以下三個部分：

1、如何對圖片進(jìn)行處理。

2、載入模型

3、預(yù)測過程中處理的流程。

在看本次算法前，建議先下載我簡化過的源碼，配合觀看，在其中運(yùn)行demo即可執(zhí)行程序：

下載鏈接：https://pan.baidu.com/s/16UtXIfE-imrzjg_rx7xTKQ

提取碼：vpo2

ssd_vgg_300主體的源碼

本文使用的ssd_vgg_300的源碼源于

鏈接：https://pan.baidu.com/s/1Wi1t9bYpTJEu5j3cq3pUnA

提取碼：6hye

本文對其進(jìn)行了簡化，只保留了預(yù)測部分，便于理順整個SSD的框架。

1、大體框架

在只需要預(yù)測的情況下，需要保留ssd_vgg_300源碼的網(wǎng)絡(luò)部分、先驗框部分和解碼部分。（這里只能使用圖片哈，因為VScode收縮后也不能只復(fù)制各個部分的函數(shù)名）

其中：

1、net函數(shù)用于構(gòu)建網(wǎng)絡(luò)，其輸入值為shape為(None,300,300,3)的圖像，在其中會經(jīng)過許多層網(wǎng)絡(luò)結(jié)構(gòu)，在這許多的網(wǎng)絡(luò)結(jié)構(gòu)中存在6個特征層，用于讀取框框，最終輸出predictions和locations，predictions和locations中包含6個層的預(yù)測結(jié)果和框的位置。

2、arg_scope用于初始化網(wǎng)絡(luò)每一個層的默認(rèn)參數(shù)，該項目會用到slim框架，slim框架是一個輕量級的tensorflow框架，其參數(shù)初始化與slim中的函數(shù)相關(guān)。

3、anchors用于獲得先驗框，先驗框也是針對6個特征層的。

4、bboxes_decode用于結(jié)合先驗框和locations獲得在img中框的位置，locations相當(dāng)于編碼過后的框的位置，這樣做可以方便SSD網(wǎng)絡(luò)學(xué)習(xí)，bboxes_decode用于解碼，解碼后可以獲得img中框的位置。

2、net網(wǎng)絡(luò)構(gòu)建

# =============================網(wǎng)絡(luò)部分============================= #
def net(self, inputs,
        is_training=True,
        update_feat_shapes=True,
        dropout_keep_prob=0.5,
        prediction_fn=slim.softmax,
        reuse=None,
        scope='ssd_300_vgg'):
    """
    SSD 網(wǎng)絡(luò)定義，調(diào)用外部函數(shù)，建立網(wǎng)絡(luò)層
    """
    r = ssd_net(inputs,
                num_classes=self.params.num_classes,
                feat_layers=self.params.feat_layers,
                anchor_sizes=self.params.anchor_sizes,
                anchor_ratios=self.params.anchor_ratios,
                normalizations=self.params.normalizations,
                is_training=is_training,
                dropout_keep_prob=dropout_keep_prob,
                prediction_fn=prediction_fn,
                reuse=reuse,
                scope=scope)
    return r

在net函數(shù)中，其調(diào)用了一個外部的函數(shù)ssd_net，我估計作者是為了讓代碼主體更簡潔。

實際的構(gòu)建代碼在ssd_net函數(shù)中，網(wǎng)絡(luò)構(gòu)建代碼中使用了許多的slim.repeat，該函數(shù)用于重復(fù)構(gòu)建卷積層，具體構(gòu)建的層共11層，在進(jìn)行目標(biāo)檢測框的選擇時，我們選擇其中的[‘block4’, ‘block7’, ‘block8’, ‘block9’, ‘block10’, ‘block11’]。

這里我們放出論文中的網(wǎng)絡(luò)結(jié)構(gòu)層。

通過該圖我們可以發(fā)現(xiàn)，其網(wǎng)絡(luò)結(jié)構(gòu)如下：

1、首先通過了多個3X3卷積層、5次步長為2的最大池化取出特征，形成了5個Block，其中第四個Block的shape為(?,38,38,512)，該層用于提取小目標(biāo)（多次卷積后大目標(biāo)的特征保存的更好，小目標(biāo)特征會消失，需要在比較靠前的層提取小目標(biāo)特征）。

2、進(jìn)行一次卷積核膨脹dilate（關(guān)于卷積核膨脹的概念可以去網(wǎng)上搜索以下哈）。

3、讀取第七個Block7的特征，shape為(?,19,19,1024)

4、分別利用1x1和3x3卷積提取特征，在3x3卷積的時候使用步長2，縮小特征數(shù)。獲取第八個Block8的特征，shape為(?,10,10,512)

5、重復(fù)步驟4，獲得9、10、11卷積層的特征，shape分別為(?,5,5,256)、(?,3,3,256)、(?,1,1,256)

此時網(wǎng)絡(luò)便構(gòu)建完了。

# =============================網(wǎng)絡(luò)部分============================= #
############################################################
#   該部分供SSDNet的net函數(shù)調(diào)用，用于建立網(wǎng)絡(luò)                 #
#   返回predictions, localisations, logits, end_points     #
############################################################
def ssd_net(inputs,
            num_classes=SSDNet.default_params.num_classes,
            feat_layers=SSDNet.default_params.feat_layers,
            anchor_sizes=SSDNet.default_params.anchor_sizes,
            anchor_ratios=SSDNet.default_params.anchor_ratios,
            normalizations=SSDNet.default_params.normalizations,
            is_training=True,
            dropout_keep_prob=0.5,
            prediction_fn=slim.softmax,
            reuse=None,
            scope='ssd_300_vgg'):
    """SSD net definition.
    """
    # 建立網(wǎng)絡(luò)
    end_points = {}
    with tf.variable_scope(scope, 'ssd_300_vgg', [inputs], reuse=reuse):
        # Block1
        '''
        相當(dāng)于執(zhí)行：
        net = self.conv2d(x,64,[3,3],scope = 'conv1_1')
        net = self.conv2d(net,64,[3,3],scope = 'conv1_2')
        '''
        # (300,300,3) -> (300,300,64) -> (150,150,64) 
        net = slim.repeat(inputs, 2, slim.conv2d, 64, [3, 3], scope='conv1')
        end_points['block1'] = net
        net = slim.max_pool2d(net, [2, 2], scope='pool1')
        # Block 2.
        '''
        相當(dāng)于執(zhí)行：
        net = self.conv2d(net,128,[3,3],scope = 'conv2_1')
        net = self.conv2d(net,128,[3,3],scope = 'conv2_2')
        '''
        # (150,150,64) -> (150,150,128) -> (75,75,128)
        net = slim.repeat(net, 2, slim.conv2d, 128, [3, 3], scope='conv2')
        end_points['block2'] = net
        net = slim.max_pool2d(net, [2, 2], scope='pool2')
        # Block 3.
        '''
        相當(dāng)于執(zhí)行：
        net = self.conv2d(net,256,[3,3],scope = 'conv3_1')
        net = self.conv2d(net,256,[3,3],scope = 'conv3_2')
        net = self.conv2d(net,256,[3,3],scope = 'conv3_3')
        '''
        # (75,75,128) -> (75,75,256) -> (38,38,256)
        net = slim.repeat(net, 3, slim.conv2d, 256, [3, 3], scope='conv3')
        end_points['block3'] = net
        net = slim.max_pool2d(net, [2, 2],stride = 2,padding = "SAME", scope='pool3')
        # Block 4.
        # 三次卷積
        # (38,38,256) -> (38,38,512) -> block4_net -> (19,19,512)
        net = slim.repeat(net, 3, slim.conv2d, 512, [3, 3], scope='conv4')
        end_points['block4'] = net
        net = slim.max_pool2d(net, [2, 2],padding = "SAME", scope='pool4')
        # Block 5.
        # 三次卷積
        # (19,19,512)->(19,19,512)
        net = slim.repeat(net, 3, slim.conv2d, 512, [3, 3], scope='conv5')
        end_points['block5'] = net
        net = slim.max_pool2d(net, [3, 3], stride=1,padding = "SAME", scope='pool5')
        # Block 6: dilate
        # 卷積核膨脹
        # (19,19,512)->(19,19,1024)
        net = slim.conv2d(net, 1024, [3, 3], rate=6, scope='conv6')
        end_points['block6'] = net
        net = tf.layers.dropout(net, rate=dropout_keep_prob, training=is_training)
        # Block 7: 1x1 conv
        # (19,19,1024)->(19,19,1024)
        net = slim.conv2d(net, 1024, [1, 1], scope='conv7')
        end_points['block7'] = net
        net = tf.layers.dropout(net, rate=dropout_keep_prob, training=is_training)
        # Block 8/9/10/11: 1x1 and 3x3 convolutions stride 2 (except lasts).
        # (19,19,1024)->(19,19,256)->(10,10,512)
        end_point = 'block8'
        with tf.variable_scope(end_point):
            net = slim.conv2d(net, 256, [1, 1], scope='conv1x1')
            net = custom_layers.pad2d(net, pad=(1, 1))
            net = slim.conv2d(net, 512, [3, 3], stride=2, scope='conv3x3', padding='VALID')
        end_points[end_point] = net
        end_point = 'block9'
        # (10,10,512)->(10,10,128)->(5,5,256)
        with tf.variable_scope(end_point):
            net = slim.conv2d(net, 128, [1, 1], scope='conv1x1')
            net = custom_layers.pad2d(net, pad=(1, 1))
            net = slim.conv2d(net, 256, [3, 3], stride=2, scope='conv3x3', padding='VALID')
        end_points[end_point] = net
        end_point = 'block10'
        # (5,5,256)->(5,5,128)->(3,3,256)
        with tf.variable_scope(end_point):
            net = slim.conv2d(net, 128, [1, 1], scope='conv1x1')
            net = slim.conv2d(net, 256, [3, 3], scope='conv3x3', padding='VALID')
        end_points[end_point] = net
        end_point = 'block11'
        # (3,3,256)->(1,1,256)
        with tf.variable_scope(end_point):
            net = slim.conv2d(net, 128, [1, 1], scope='conv1x1')
            net = slim.conv2d(net, 256, [3, 3], scope='conv3x3', padding='VALID')
        end_points[end_point] = net
        # 預(yù)測和定位層
        predictions = []
        logits = []
        localisations = []
        for i, layer in enumerate(feat_layers):
            with tf.variable_scope(layer + '_box'):
                p, l = ssd_multibox_layer(end_points[layer],
                                          num_classes,
                                          anchor_sizes[i],
                                          anchor_ratios[i],
                                          normalizations[i])
            predictions.append(prediction_fn(p))
            logits.append(p)
            localisations.append(l)
        return predictions, localisations, logits, end_points
ssd_net.default_image_size = 300

仔細(xì)看代碼的同學(xué)會發(fā)現(xiàn)，除去層的構(gòu)建外，最后還多了一段循環(huán)，那這個循環(huán)是做什么的呢？而且同學(xué)們可以感受到，雖然我們提取了特征層，但是這個特征層和預(yù)測值、框的位置又有什么關(guān)系呢？

這個循環(huán)就是用來將特征層轉(zhuǎn)化成預(yù)測值和框的位置的。

在循環(huán)中我們調(diào)用了ssd_multibox_layer函數(shù)，該函數(shù)的作用如下：

1、讀取網(wǎng)絡(luò)的特征層

2、對網(wǎng)絡(luò)的特征層再次進(jìn)行卷積，該卷積分為兩部分，互不相干，分別用于預(yù)測種類和框的位置。

3、預(yù)測框的位置，以Block4為例，Block4的shape為(?,38,38,512)，再次卷積后，使其shape變?yōu)??,38,38,num_anchors x 4)，其中num_anchors是每個特征點(diǎn)中先驗框的數(shù)量，4代表框的特點(diǎn)，一個框需要4個特征才可以確定位置，最后再reshape為(?,38,38,num_anchors,4)，代表38x38個特點(diǎn)中，第num_anchors個框下的4個特點(diǎn)。

4、預(yù)測種類，以Block4為例，Block4的shape為(?,38,38,512)，再次卷積后，使其shape變?yōu)??,38,38,num_anchors x 21)，其中num_anchors是每個特征點(diǎn)中先驗框的數(shù)量，21代表預(yù)測的種類，包含背景，SSD算法共預(yù)測21個種類，最后再reshape為(?,38,38,num_anchors,21)，代表38x38個特點(diǎn)中，第num_anchors個框下的21個預(yù)測結(jié)果。

該函數(shù)的輸出結(jié)果中：

location_pred的shape為(?,feat_block.shape[0],feat_block.shape[1], num_anchors,4)

class_pred的shape為(?,feat_block.shape[0],feat_block.shape[1],num_anchors,21)

具體執(zhí)行代碼如下：

############################################################
#   該部分供ssd_net函數(shù)調(diào)用，返回種類預(yù)測和位置預(yù)測            #
#   將特征層的內(nèi)容輸入，根據(jù)特征層返回預(yù)測結(jié)果                 #
############################################################
def ssd_multibox_layer(inputs,
                       num_classes,
                       sizes,
                       ratios=[1],
                       normalization=-1,
                       bn_normalization=False):
    reshape = [-1] + inputs.get_shape().as_list()[1:-1]  # 去除第一個和最后一個得到shape
    net = inputs
    # 對第一個特征層進(jìn)行l(wèi)2標(biāo)準(zhǔn)化。
    if normalization > 0:
        net = custom_layers.l2_normalization(net, scaling=True)
    # Number of anchors.
    num_anchors = len(sizes) + len(ratios)
    # Location.
    num_loc_pred = num_anchors * 4
    loc_pred = slim.conv2d(net, num_loc_pred, [3, 3], activation_fn=None,
                           scope='conv_loc')
    loc_pred = custom_layers.channel_to_last(loc_pred)
    loc_pred = tf.reshape(loc_pred,
                          reshape + [num_anchors, 4])
    # Class prediction.
    num_cls_pred = num_anchors * num_classes
    cls_pred = slim.conv2d(net, num_cls_pred, [3, 3], activation_fn=None,
                           scope='conv_cls')
    cls_pred = custom_layers.channel_to_last(cls_pred)
    cls_pred = tf.reshape(cls_pred,
                          reshape + [num_anchors, num_classes])
    return cls_pred, loc_pred

3、anchor先驗框生成

# ==========================生成先驗框部分========================== #
def anchors(self, img_shape, dtype=np.float32):
    """
    計算給定圖像形狀的默認(rèn)定位框，調(diào)用外部函數(shù)，獲得先驗框。
    """
    return ssd_anchors_all_layers(img_shape,
                                    self.params.feat_shapes,
                                    self.params.anchor_sizes,
                                    self.params.anchor_ratios,
                                    self.params.anchor_steps,
                                    self.params.anchor_offset,
                                    dtype)

在anchor函數(shù)中，其調(diào)用了一個外部的函數(shù)ssd_anchors_all_layers，用于構(gòu)建先驗框。先驗框的構(gòu)建和上述網(wǎng)絡(luò)的構(gòu)建關(guān)系不大，但是需要用到上述網(wǎng)絡(luò)net的特征層size，先驗框的構(gòu)建目的是為了讓圖片構(gòu)建出合適比例的框，同時可以減少運(yùn)算量。

在進(jìn)入ssd_anchors_all_layers函數(shù)后，根據(jù)名字可以知道，該函數(shù)用于生成所有層的先驗框，其會進(jìn)入一個循環(huán)，該循環(huán)用于根據(jù)每個特征層的size進(jìn)行先驗框的構(gòu)建

代碼如下：

############################################################
#   該部分供SSDNet的anchors函數(shù)調(diào)用，用于獲取先驗框           #
#   返回y,x,h,w的組和                                       #
############################################################
def ssd_anchors_all_layers(img_shape,
                           layers_shape,
                           anchor_sizes,
                           anchor_ratios,
                           anchor_steps,
                           offset=0.5,
                           dtype=np.float32):
    """
    對所有特征層進(jìn)行計算
    """
    layers_anchors = []
    for i, s in enumerate(layers_shape):
        anchor_bboxes = ssd_anchor_one_layer(img_shape, s,
                                             anchor_sizes[i],
                                             anchor_ratios[i],
                                             anchor_steps[i],
                                             offset=offset, dtype=dtype)
        layers_anchors.append(anchor_bboxes)
    return layers_anchors

此時再調(diào)用ssd_anchor_one_layer，根據(jù)名字可以知道，該函數(shù)用于生成單層的先驗框，該部分是先驗框生成的核心。

輸入?yún)?shù)包括圖像大小img_shape，特征層大小feat_shape，先驗框大小sizes，先驗框長寬比率sizes，先驗框放大倍數(shù)step。

執(zhí)行過程：

1、根據(jù)feat_shape生成x、y的網(wǎng)格。

2、將x和y歸一化到0到1之間，這里的x和y對應(yīng)每個特征層的每一個點(diǎn)，同時x，y對應(yīng)每個框的中心。

3、生成每個特征層的每個點(diǎn)對應(yīng)的num_anchors大小相同的h和w，即4、6、6、6、4、4，這里的h和w對應(yīng)著每一個點(diǎn)對應(yīng)的num_anchors個框中的h和w。

4、將h和w每個賦值，h[0]對應(yīng)比較小的正方形，h[1]對應(yīng)比較大的正方形，h[2]和h[3]對應(yīng)√2下不同的長方形，h[4]和h[5]對應(yīng)√3下不同的長方形。

輸出的參數(shù)包括：

X和Y的shape為(block.shape[0],block.shape[1],1)

H和w的shape為(boxes_len)

具體的執(zhí)行代碼如下：

############################################################
#   該部分供ssd_anchors_all_layers函數(shù)調(diào)用                  #
#   用于獲取單層的先驗框返回y,x,h,w                          #
############################################################
def ssd_anchor_one_layer(img_shape,
                         feat_shape,
                         sizes,
                         ratios,
                         step,
                         offset=0.5,
                         dtype=np.float32):
    """
    輸入：圖像大小img_shape，特征層大小feat_shape，先驗框大小sizes，
        先驗框長寬比率sizes，先驗框放大倍數(shù)step。
    執(zhí)行過程：
        生成x、y的網(wǎng)格。
        將x和y歸一化到0到1之間。
        生成每個特征層的每個點(diǎn)對應(yīng)的boxes_len大小相同的h和w，即4、6、6、6、4、4。
        將h和w每個賦值，h[0]對應(yīng)比較小的正方形，h[1]對應(yīng)比較大的正方形，
                    h[2]和h[3]對應(yīng)√2下不同的長方形，h[4]和h[5]對應(yīng)√3下不同的長方形。
    輸出：
    X和Y的shape為(block.shape[0],block.shape[1],1)
    H和w的shape為(boxes_len)
    """
    # 網(wǎng)格化
    y, x = np.mgrid[0:feat_shape[0], 0:feat_shape[1]]
    # 歸一化
    y = (y.astype(dtype) + offset) * step / img_shape[0]
    x = (x.astype(dtype) + offset) * step / img_shape[1]
    # 拓充維度，便于后面decode計算
    y = np.expand_dims(y, axis=-1)
    x = np.expand_dims(x, axis=-1)
    # 每一個點(diǎn)框框的數(shù)量 
    num_anchors = len(sizes) + len(ratios)
    h = np.zeros((num_anchors, ), dtype=dtype)
    w = np.zeros((num_anchors, ), dtype=dtype)
    # 第一個第二個框框是正方形
    h[0] = sizes[0] / img_shape[0]
    w[0] = sizes[0] / img_shape[1]
    di = 1
    if len(sizes) > 1:
        h[1] = math.sqrt(sizes[0] * sizes[1]) / img_shape[0]
        w[1] = math.sqrt(sizes[0] * sizes[1]) / img_shape[1]
        di += 1
    for i, r in enumerate(ratios):
        h[i+di] = sizes[0] / img_shape[0] / math.sqrt(r)
        w[i+di] = sizes[0] / img_shape[1] * math.sqrt(r)
    return y, x, h, w

在看該部分的時候，需要結(jié)合參數(shù)，所用參數(shù)如下：

img_shape=(300, 300)
feat_shapes=[(38, 38), (19, 19), (10, 10), (5, 5), (3, 3), (1, 1)],
# 先驗框的size
anchor_sizes=[(21., 45.),
                (45., 99.),
                (99., 153.),
                (153., 207.),
                (207., 261.),
                (261., 315.)],
# 框的數(shù)量為4，6，6，6，4，4
# 框的數(shù)量為2+len(anchor_ratios[i])
anchor_ratios=[[2, .5],
                [2, .5, 3, 1./3],
                [2, .5, 3, 1./3],
                [2, .5, 3, 1./3],
                [2, .5],
                [2, .5]],
# 放大倍數(shù)
anchor_steps=[8, 16, 32, 64, 100, 300],

仔細(xì)研讀這段代碼會發(fā)現(xiàn)其設(shè)計非常巧妙哈。

x和y會執(zhí)行歸一化，到0，1之間，如果除去xy的最后一維進(jìn)行plot，其會呈現(xiàn)一個0到1的網(wǎng)格，以38x38的特征層的先驗框為例，其繪制出的網(wǎng)格如下，其實每一個點(diǎn)對應(yīng)的就是每個框的中心點(diǎn)。

h和w對應(yīng)著每個框的高與寬，寬高成一定比例。

4、bboxes_decode框的解碼

# =============================解碼部分============================= #
def bboxes_decode(self, feat_localizations, anchors,
                    scope='ssd_bboxes_decode'):
    """
    進(jìn)行解碼操作
    """
    return ssd_common.tf_ssd_bboxes_decode(
        feat_localizations, anchors,
        prior_scaling=self.params.prior_scaling,
        scope=scope)

在bboxes_decode函數(shù)中，其調(diào)用了一個外部的函數(shù)ssd_common.tf_ssd_bboxes_decode，用于構(gòu)建框的解碼，其位于其它的文件中。

執(zhí)行框的解碼的原因是，利用net網(wǎng)絡(luò)預(yù)測得到的locations并不是實際的框的位置，其需要與先驗框結(jié)合處理后才能得到最后的框的位置。

這里需要注意的是，decode的過程需要兩個參數(shù)的結(jié)合，分別是net網(wǎng)絡(luò)構(gòu)建得到的locations和anchor先驗框生成得到的先驗框。

在進(jìn)入ssd_common.tf_ssd_bboxes_decode函數(shù)后，其執(zhí)行過程與anchor先驗框生成類似，內(nèi)部也有一個循環(huán)，意味著要對每一個特征層進(jìn)行單獨(dú)的處理。

def tf_ssd_bboxes_decode(feat_localizations,
                         anchors,
                         prior_scaling=[0.1, 0.1, 0.2, 0.2],
                         scope='ssd_bboxes_decode'):
    """
      從ssd網(wǎng)絡(luò)特性和先驗框框計算相對邊界框。
    """
    with tf.name_scope(scope):
        bboxes = []
        for i, anchors_layer in enumerate(anchors):
            bboxes.append(
                tf_ssd_bboxes_decode_layer(feat_localizations[i],
                                           anchors_layer,
                                           prior_scaling))
        return bboxes

在如上的執(zhí)行過程中，內(nèi)部存在一個tf_ssd_bboxes_decode_layer函數(shù)，該部分是先驗框生成的核心，在tf_ssd_bboxes_decode_layer中，程序會對每一個特征層的框進(jìn)行解碼。

其輸入包括，一個特征層的框的預(yù)測定位feat_localizations，每一層的先驗框anchors_layer，先驗框比率prior_scaling

執(zhí)行過程：

1、分解anchors_layer，因為anchors_layer由多個y，x，h，w構(gòu)成

2、計算cx和cy，這里存在一個計算公式，公式論文中給出了。

3、計算cw和ch，這里存在一個計算公式，公式論文中給出了。

4、將[cy - ch / 2.0, cx - cw / 2.0, cy + ch / 2.0, cx + cw / 2.0]輸出，其對應(yīng)左上角角點(diǎn)和右下角角點(diǎn)。

其輸出包括：左上角角點(diǎn)和右下角角點(diǎn)的集合bboxes。

bboxes的shape為(?,block.shape[0],block.shape[1], boxes_len,4)

具體執(zhí)行代碼如下：

# =========================================================================== #
# 編碼解碼部分
# =========================================================================== #
def tf_ssd_bboxes_decode_layer(feat_localizations,
                               anchors_layer,
                               prior_scaling=[0.1, 0.1, 0.2, 0.2]):
    """
    其輸入包括，一個特征層的框的預(yù)測定位feat_localizations，每一層的先驗框anchors_layer，先驗框比率prior_scaling
    執(zhí)行過程：
    1、	分解anchors_layer，因為anchors_layer由多個y，x，h，w構(gòu)成
    2、	計算cx和cy，這里存在一個計算公式
    3、	計算cw和ch，這里存在一個計算公式
    4、	將[cy - ch / 2.0, cx - cw / 2.0, cy + ch / 2.0, cx + cw / 2.0]輸出，其對應(yīng)左上角角點(diǎn)和右下角角點(diǎn)。
    其輸出包括：左上角角點(diǎn)和右下角角點(diǎn)的集合bboxes。
    bboxes的shape為(?,block.shape[0],block.shape[1], boxes_len,4)
    """
    yref, xref, href, wref = anchors_layer
    # 計算中心點(diǎn)和它的寬長
    cx = feat_localizations[:, :, :, :, 0] * wref * prior_scaling[0] + xref
    cy = feat_localizations[:, :, :, :, 1] * href * prior_scaling[1] + yref
    w = wref * tf.exp(feat_localizations[:, :, :, :, 2] * prior_scaling[2])
    h = href * tf.exp(feat_localizations[:, :, :, :, 3] * prior_scaling[3])
    # 計算左上角點(diǎn)和右下角點(diǎn)
    ymin = cy - h / 2.
    xmin = cx - w / 2.
    ymax = cy + h / 2.
    xmax = cx + w / 2.
    bboxes = tf.stack([ymin, xmin, ymax, xmax], axis=-1)
    return bboxes

解碼完后的bboxes表示某一個特征層中的框在真實圖像中的位置。

利用ssd_vgg_300進(jìn)行預(yù)測

預(yù)測步驟

進(jìn)行預(yù)測需要進(jìn)行以下步驟：

1、建立ssd對象

2、利用ssd_net = ssd_vgg_300.SSDNet()獲得網(wǎng)絡(luò)，得到兩個tensorflow格式的預(yù)測結(jié)果。

3、載入ssd模型。

4、讀入圖片image_names。

5、將圖片預(yù)處理后，傳入網(wǎng)絡(luò)結(jié)構(gòu)，獲得預(yù)測結(jié)果，預(yù)測結(jié)果包括框的位置、每個框的預(yù)測結(jié)果。

6、利用ssd_bboxes_select函數(shù)選擇得分高于門限的框。

7、對所有的得分進(jìn)行排序，取出得分top400的框

8、非極大值抑制，該部分用于去除重復(fù)率過高的框。

9、在原圖中繪制框框。

具體預(yù)測過程

1、圖片的預(yù)處理

圖片預(yù)處理時，需要調(diào)用如下代碼：

# 輸入圖片大小
net_shape = (300, 300)
# data_format 設(shè)置為 "NHWC" 時，排列順序為 [batch, height, width, channels]
# 具體使用方法可以查看該網(wǎng)址：https://www.jianshu.com/p/d8a699745529
data_format = 'NHWC'
# img_input的placeholder
img_input = tf.placeholder(tf.uint8, shape = (None, None, 3))
# 對圖片進(jìn)行預(yù)處理，得到bbox_img和image_4d
image_pre, labels_pre, bboxes_pre, bbox_img = ssd_vgg_preprocessing.preprocess_for_eval(
    img_input, None, None, net_shape, data_format, resize = ssd_vgg_preprocessing.Resize.WARP_RESIZE)
# 由于只檢測一張圖片，所以需要在第一維添加一維度
image_4d = tf.expand_dims(image_pre, 0)

看起來代碼很長，特別是倒數(shù)第二段代碼，但是其實里面什么也沒有。

ssd_vgg_preprocessing.preprocess_for_eval的主要執(zhí)行過程就是：

1、將image減去voc2012得到的所有圖片的RGB平均值。

2、增加比例預(yù)處理框（這個的作用我不太懂，我覺得它的意思應(yīng)該就是這個圖片可能是一個大圖片里面截出的一小個圖片，需要對這個比例進(jìn)行縮放，但是實際運(yùn)用的時候應(yīng)該就是一個大圖片）。

3、將圖片resize到300x300。

4、判斷使用CPU還是GPU。

def preprocess_for_eval(image, labels, bboxes,
                        out_shape=EVAL_SIZE, data_format='NHWC',
                        difficults=None, resize=Resize.WARP_RESIZE,
                        scope='ssd_preprocessing_train'):
    """
    預(yù)處理
    """
    with tf.name_scope(scope):
        if image.get_shape().ndims != 3:
            raise ValueError('Input must be of size [height, width, C>0]')
        # 將image減去voc2012得到的所有圖片的RGB平均值
        image = tf.to_float(image)
        image = tf_image_whitened(image, [_R_MEAN, _G_MEAN, _B_MEAN])
        # 增加比例預(yù)處理框
        bbox_img = tf.constant([[0., 0., 1., 1.]])
        if bboxes is None:
            bboxes = bbox_img
        else:
            bboxes = tf.concat([bbox_img, bboxes], axis=0)
        # 這一大段其實只調(diào)用了最后一個elif
        # 將圖片resize到300x300
        if resize == Resize.NONE:
            # No resizing...
            pass
        elif resize == Resize.CENTRAL_CROP:
            # Central cropping of the image.
            image, bboxes = tf_image.resize_image_bboxes_with_crop_or_pad(
                image, bboxes, out_shape[0], out_shape[1])
        elif resize == Resize.PAD_AND_RESIZE:
            # Resize image first: find the correct factor...
            shape = tf.shape(image)
            factor = tf.minimum(tf.to_double(1.0),
                                tf.minimum(tf.to_double(out_shape[0] / shape[0]),
                                           tf.to_double(out_shape[1] / shape[1])))
            resize_shape = factor * tf.to_double(shape[0:2])
            resize_shape = tf.cast(tf.floor(resize_shape), tf.int32)
            image = tf_image.resize_image(image, resize_shape,
                                          method=tf.image.ResizeMethod.BILINEAR,
                                          align_corners=False)
            # Pad to expected size.
            image, bboxes = tf_image.resize_image_bboxes_with_crop_or_pad(
                image, bboxes, out_shape[0], out_shape[1])
        elif resize == Resize.WARP_RESIZE:
            # Warp resize of the image.
            image = tf_image.resize_image(image, out_shape,
                                          method=tf.image.ResizeMethod.BILINEAR,
                                          align_corners=False)
        # 分割比例box
        bbox_img = bboxes[0]
        bboxes = bboxes[1:]
        # ……不知道干嘛
        if difficults is not None:
            mask = tf.logical_not(tf.cast(difficults, tf.bool))
            labels = tf.boolean_mask(labels, mask)
            bboxes = tf.boolean_mask(bboxes, mask)
        # 看使用cpu還是GPU
        if data_format == 'NCHW':
            image = tf.transpose(image, perm=(2, 0, 1))
        return image, labels, bboxes, bbox_img

2、載入ssd模型

載入ssd模型分為以下幾步：

1、建立Session會話

2、建立ssd網(wǎng)絡(luò)

3、載入模型

執(zhí)行代碼如下：

# 載入ssd的模型
# 建立Session()
isess = tf.Session()
reuse = True if 'ssd_net' in locals() else None
# 建立網(wǎng)絡(luò)
ssd_net = ssd_vgg_300.SSDNet()
with slim.arg_scope(ssd_net.arg_scope(data_format = data_format)):
    predictions, localisations, _, _ = ssd_net.net(image_4d, is_training = False, reuse = reuse)
# 載入模型
ckpt_filename = 'D:/Collection/SSD-Tensorflow-master/logs/model.ckpt-18602'
isess.run(tf.global_variables_initializer())
saver = tf.train.Saver()
saver.restore(isess, ckpt_filename)

3、讀取圖片進(jìn)行預(yù)測

該部分需要進(jìn)行如下操作：

1、獲取先驗框。

2、讀取圖片。

3、將圖片放入已經(jīng)完成載入的模型，得到predictions和locations。

4、將每個特征層的預(yù)測結(jié)果都進(jìn)行篩選，得分小于threshold的都剔除，并使得所有特征層的預(yù)測結(jié)果都并排存入一個list。

5、對所有的預(yù)測結(jié)果進(jìn)行得分的排序，取出top400的框框。

6、進(jìn)行非極大抑制，取出重復(fù)率過高的框。

7、在原圖中繪制框。

具體執(zhí)行代碼如下：

# 獲得所有先驗框，六個特征層的
ssd_anchors = ssd_net.anchors(net_shape)
def process_image(img, select_threshold = 0.5, nms_threshold = .45, net_shape = (300, 300)):
    # 運(yùn)行SSD模型
    rimg, rpredictions, rlocalisations, rbbox_img = isess.run([image_4d, predictions, localisations, bbox_img],
                                                              feed_dict = {img_input: img})
    # 得到20個類的得分，框框的位置
    rclasses, rscores, rbboxes = np_methods.ssd_bboxes_select(
        rpredictions, rlocalisations, ssd_anchors,
        select_threshold = select_threshold, img_shape = net_shape, num_classes = 21, decode = True)
    # 防止超出邊界
    rbboxes = np_methods.bboxes_clip(rbbox_img, rbboxes)
    # 取出top400，并通過極大值抑制除去類似框
    rclasses, rscores, rbboxes = np_methods.bboxes_sort(rclasses, rscores, rbboxes, top_k = 400)
    rclasses, rscores, rbboxes = np_methods.bboxes_nms(rclasses, rscores, rbboxes, nms_threshold = nms_threshold)
    # 在img里進(jìn)行等比例縮放resize
    rbboxes = np_methods.bboxes_resize(rbbox_img, rbboxes)
    return rclasses, rscores, rbboxes
# 讀取圖片
img = mpimg.imread('./street.jpg')
# 進(jìn)行圖片的預(yù)測
rclasses, rscores, rbboxes = process_image(img)
visualization.plt_bboxes(img, rclasses, rscores, rbboxes)

其中，預(yù)測結(jié)果篩選的代碼如下：

該部分首先解碼；
再將每個特征層進(jìn)行reshape完成平鋪；
讀出除去背景的得分；
將得分多余threshold的類進(jìn)行保存，小于的進(jìn)行剔除；
利用np.concatenate將結(jié)果同一排擺放。

def ssd_bboxes_select_layer(predictions_layer,
                            localizations_layer,
                            anchors_layer,
                            select_threshold=0.5,
                            img_shape=(300, 300),
                            num_classes=21,
                            decode=True):
    """
        選擇大于門限的框
    """
    # 對框進(jìn)行解碼
    if decode:
        localizations_layer = ssd_bboxes_decode(localizations_layer, anchors_layer)
    # 將所有預(yù)測結(jié)果變?yōu)?維，第一維度維batch，第二維度為size，第三維度為class_num | 4
    p_shape = predictions_layer.shape
    batch_size = p_shape[0] if len(p_shape) == 5 else 1
    predictions_layer = np.reshape(predictions_layer,
                                   (batch_size, -1, p_shape[-1]))
    l_shape = localizations_layer.shape
    localizations_layer = np.reshape(localizations_layer,
                                     (batch_size, -1, l_shape[-1]))
    if select_threshold is None or select_threshold == 0:
        classes = np.argmax(predictions_layer, axis=2)
        scores = np.amax(predictions_layer, axis=2)
        mask = (classes > 0)
        classes = classes[mask]
        scores = scores[mask]
        bboxes = localizations_layer[mask]
    else:
        # 取出所有的預(yù)測結(jié)果
        sub_predictions = predictions_layer[:, :, 1:]
        # 判斷哪里的預(yù)測結(jié)果大于門限
        idxes = np.where(sub_predictions > select_threshold)
        # 如果大于門限則留下，并+1，除去背景
        classes = idxes[-1]+1
        # 取出所有分?jǐn)?shù)
        scores = sub_predictions[idxes]
        # 和框的位置
        bboxes = localizations_layer[idxes[:-1]]
    return classes, scores, bboxes

對所有的預(yù)測結(jié)果進(jìn)行得分的排序，取出top400的框框的過程非常簡單，代碼如下：

首先利用argsort對得分進(jìn)行排序，并從大到小排序得分的序號；

取出種類classes、得分scores、框bboxes的top400個。

def bboxes_sort(classes, scores, bboxes, top_k=400):
    """
    進(jìn)行排序篩選
    """
    idxes = np.argsort(-scores)
    classes = classes[idxes][:top_k]
    scores = scores[idxes][:top_k]
    bboxes = bboxes[idxes][:top_k]
    return classes, scores, bboxes

進(jìn)行非極大抑制的過程也比較簡單，具體代碼如下：

將bboxes中每一個框，從得分最高到得分最低依次與其之后所有的框比較；

IOU較小或者屬于不同類的框得到保留；

def bboxes_nms(classes, scores, bboxes, nms_threshold=0.45):
    """
    非極大抑制，去除重復(fù)率過大的框.
    """
    keep_bboxes = np.ones(scores.shape, dtype=np.bool)
    for i in range(scores.size-1):
        if keep_bboxes[i]:
            # 計算重疊區(qū)域
            overlap = bboxes_jaccard(bboxes[i], bboxes[(i+1):])
            # 保留重疊區(qū)域不是很大或者種類不同的
            keep_overlap = np.logical_or(overlap < nms_threshold, classes[(i+1):] != classes[i])
            keep_bboxes[(i+1):] = np.logical_and(keep_bboxes[(i+1):], keep_overlap)
    # 保留重疊部分小或者種類不同的
    idxes = np.where(keep_bboxes)
    return classes[idxes], scores[idxes], bboxes[idxes]

4、全部預(yù)測代碼

import os
import math
import random
import numpy as np
import tensorflow as tf
import cv2
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import sys
sys.path.append('./')
from nets import ssd_vgg_300, ssd_common, np_methods
from preprocessing import ssd_vgg_preprocessing
from notebooks import visualization
# 構(gòu)建slim框架。
slim = tf.contrib.slim
# 輸入圖片大小
net_shape = (300, 300)
# data_format 設(shè)置為 "NHWC" 時，排列順序為 [batch, height, width, channels]
# 具體使用方法可以查看：https://www.jianshu.com/p/d8a699745529。
data_format = 'NHWC'
# img_input的placeholder
img_input = tf.placeholder(tf.uint8, shape = (None, None, 3))
# 對圖片進(jìn)行預(yù)處理，得到bbox_img和image_4d
image_pre, labels_pre, bboxes_pre, bbox_img = ssd_vgg_preprocessing.preprocess_for_eval(
    img_input, None, None, net_shape, data_format, resize = ssd_vgg_preprocessing.Resize.WARP_RESIZE)
# 由于只檢測一張圖片，所以需要在第一維添加一維度
image_4d = tf.expand_dims(image_pre, 0)
# 載入ssd的模型
# 建立Session()
isess = tf.Session()
reuse = True if 'ssd_net' in locals() else None
# 建立網(wǎng)絡(luò)
ssd_net = ssd_vgg_300.SSDNet()
with slim.arg_scope(ssd_net.arg_scope(data_format = data_format)):
    predictions, localisations, _, _ = ssd_net.net(image_4d, is_training = False, reuse = reuse)
# 載入模型
ckpt_filename = './logs/model.ckpt-1498'
isess.run(tf.global_variables_initializer())
saver = tf.train.Saver()
saver.restore(isess, ckpt_filename)
# 獲得所有先驗框，六個特征層的
ssd_anchors = ssd_net.anchors(net_shape)
def process_image(img, select_threshold = 0.5, nms_threshold = .45, net_shape = (300, 300)):
    # 運(yùn)行SSD模型
    rimg, rpredictions, rlocalisations, rbbox_img = isess.run([image_4d, predictions, localisations, bbox_img],
                                                              feed_dict = {img_input: img})
    # 得到20個類的得分，框框的位置
    rclasses, rscores, rbboxes = np_methods.ssd_bboxes_select(
        rpredictions, rlocalisations, ssd_anchors,
        select_threshold = select_threshold, img_shape = net_shape, num_classes = 21, decode = True)
    # 防止超出邊界
    rbboxes = np_methods.bboxes_clip(rbbox_img, rbboxes)
    # 取出top400，并通過極大值抑制除去類似框
    rclasses, rscores, rbboxes = np_methods.bboxes_sort(rclasses, rscores, rbboxes, top_k = 400)
    rclasses, rscores, rbboxes = np_methods.bboxes_nms(rclasses, rscores, rbboxes, nms_threshold = nms_threshold)
    # 在img里進(jìn)行等比例縮放resize
    rbboxes = np_methods.bboxes_resize(rbbox_img, rbboxes)
    return rclasses, rscores, rbboxes
# 讀取圖片
img = mpimg.imread('./street.jpg')
# 進(jìn)行圖片的預(yù)測
rclasses, rscores, rbboxes = process_image(img)
visualization.plt_bboxes(img, rclasses, rscores, rbboxes)

以上就是python目標(biāo)檢測SSD算法預(yù)測部分源碼詳解的詳細(xì)內(nèi)容，更多關(guān)于python目標(biāo)檢測SSD算法預(yù)測的資料請關(guān)注腳本之家其它相關(guān)文章！

您可能感興趣的文章:

欧美bbbwbbbw肥妇,免费乱码人妻系列日韩,一级黄片

軟件下載

源碼下載

軟件編程

網(wǎng)絡(luò)編程

在線工具

數(shù)據(jù)庫

CMS

常用工具

python目標(biāo)檢測SSD算法預(yù)測部分源碼詳解

目錄

學(xué)習(xí)前言

什么是SSD算法

講解構(gòu)架

ssd_vgg_300主體的源碼

1、大體框架

2、net網(wǎng)絡(luò)構(gòu)建

3、anchor先驗框生成

4、bboxes_decode框的解碼

利用ssd_vgg_300進(jìn)行預(yù)測

預(yù)測步驟

具體預(yù)測過程

1、圖片的預(yù)處理

2、載入ssd模型

3、讀取圖片進(jìn)行預(yù)測

4、全部預(yù)測代碼

相關(guān)文章

最新評論

大家感興趣的內(nèi)容

最近更新的內(nèi)容

常用在線小工具

python目標(biāo)檢測SSD算法預(yù)測部分源碼詳解

目錄

學(xué)習(xí)前言

什么是SSD算法

講解構(gòu)架

ssd_vgg_300主體的源碼

1、 大體框架

2、net網(wǎng)絡(luò)構(gòu)建

3、anchor先驗框生成

4、bboxes_decode框的解碼

利用ssd_vgg_300進(jìn)行預(yù)測

預(yù)測步驟

具體預(yù)測過程

1、圖片的預(yù)處理

2、載入ssd模型

3、讀取圖片進(jìn)行預(yù)測

4、全部預(yù)測代碼

相關(guān)文章

最新評論

大家感興趣的內(nèi)容

最近更新的內(nèi)容

常用在線小工具

1、大體框架

2、net網(wǎng)絡(luò)構(gòu)建

3、anchor先驗框生成

1、圖片的預(yù)處理

4、全部預(yù)測代碼