腳本之家服務(wù)器常用軟件

快捷導(dǎo)航

Pytorch搭建yolo3目標(biāo)檢測平臺實(shí)現(xiàn)源碼

更新時間：2022年05月09日 14:04:43 作者：Bubbliiiing

這篇文章主要為大家介紹了Pytorch搭建yolo3目標(biāo)檢測平臺實(shí)現(xiàn)源碼，有需要的朋友可以借鑒參考下，希望能夠有所幫助，祝大家多多進(jìn)步，早日升職加薪

yolo3實(shí)現(xiàn)思路

一起來看看yolo3的Pytorch實(shí)現(xiàn)吧，順便訓(xùn)練一下自己的數(shù)據(jù)。

源碼下載

一、預(yù)測部分

1、主題網(wǎng)絡(luò)darknet53介紹

請?zhí)砑訄D片描述

YoloV3所使用的主干特征提取網(wǎng)絡(luò)為Darknet53，它具有兩個重要特點(diǎn)：

1、Darknet53具有一個重要特點(diǎn)是使用了殘差網(wǎng)絡(luò)Residual，Darknet53中的殘差卷積就是首先進(jìn)行一次卷積核大小為3X3、步長為2的卷積，該卷積會壓縮輸入進(jìn)來的特征層的寬和高，此時我們可以獲得一個特征層，我們將該特征層命名為layer。之后我們再對該特征層進(jìn)行一次1X1的卷積和一次3X3的卷積，并把這個結(jié)果加上layer，此時我們便構(gòu)成了殘差結(jié)構(gòu)。通過不斷的1X1卷積和3X3卷積以及殘差邊的疊加，我們便大幅度的加深了網(wǎng)絡(luò)。殘差網(wǎng)絡(luò)的特點(diǎn)是容易優(yōu)化，并且能夠通過增加相當(dāng)?shù)纳疃葋硖岣邷?zhǔn)確率。其內(nèi)部的殘差塊使用了跳躍連接，緩解了在深度神經(jīng)網(wǎng)絡(luò)中增加深度帶來的梯度消失問題。

2、Darknet53的每一個卷積部分使用了特有的DarknetConv2D結(jié)構(gòu)，每一次卷積的時候進(jìn)行l(wèi)2正則化，完成卷積后進(jìn)行BatchNormalization標(biāo)準(zhǔn)化與LeakyReLU。普通的ReLU是將所有的負(fù)值都設(shè)為零，Leaky ReLU則是給所有負(fù)值賦予一個非零斜率。以數(shù)學(xué)的方式我們可以表示為：

實(shí)現(xiàn)代碼為：

import math
from collections import OrderedDict
import torch.nn as nn
#---------------------------------------------------------------------#
#   殘差結(jié)構(gòu)
#   利用一個1x1卷積下降通道數(shù)，然后利用一個3x3卷積提取特征并且上升通道數(shù)
#   最后接上一個殘差邊
#---------------------------------------------------------------------#
class BasicBlock(nn.Module):
    def __init__(self, inplanes, planes):
        super(BasicBlock, self).__init__()
        self.conv1  = nn.Conv2d(inplanes, planes[0], kernel_size=1, stride=1, padding=0, bias=False)
        self.bn1    = nn.BatchNorm2d(planes[0])
        self.relu1  = nn.LeakyReLU(0.1)
        self.conv2  = nn.Conv2d(planes[0], planes[1], kernel_size=3, stride=1, padding=1, bias=False)
        self.bn2    = nn.BatchNorm2d(planes[1])
        self.relu2  = nn.LeakyReLU(0.1)
    def forward(self, x):
        residual = x
        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu1(out)
        out = self.conv2(out)
        out = self.bn2(out)
        out = self.relu2(out)
        out += residual
        return out
class DarkNet(nn.Module):
    def __init__(self, layers):
        super(DarkNet, self).__init__()
        self.inplanes = 32
        # 416,416,3 -> 416,416,32
        self.conv1  = nn.Conv2d(3, self.inplanes, kernel_size=3, stride=1, padding=1, bias=False)
        self.bn1    = nn.BatchNorm2d(self.inplanes)
        self.relu1  = nn.LeakyReLU(0.1)
        # 416,416,32 -> 208,208,64
        self.layer1 = self._make_layer([32, 64], layers[0])
        # 208,208,64 -> 104,104,128
        self.layer2 = self._make_layer([64, 128], layers[1])
        # 104,104,128 -> 52,52,256
        self.layer3 = self._make_layer([128, 256], layers[2])
        # 52,52,256 -> 26,26,512
        self.layer4 = self._make_layer([256, 512], layers[3])
        # 26,26,512 -> 13,13,1024
        self.layer5 = self._make_layer([512, 1024], layers[4])
        self.layers_out_filters = [64, 128, 256, 512, 1024]
        # 進(jìn)行權(quán)值初始化
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
                m.weight.data.normal_(0, math.sqrt(2. / n))
            elif isinstance(m, nn.BatchNorm2d):
                m.weight.data.fill_(1)
                m.bias.data.zero_()
    #---------------------------------------------------------------------#
    #   在每一個layer里面，首先利用一個步長為2的3x3卷積進(jìn)行下采樣
    #   然后進(jìn)行殘差結(jié)構(gòu)的堆疊
    #---------------------------------------------------------------------#
    def _make_layer(self, planes, blocks):
        layers = []
        # 下采樣，步長為2，卷積核大小為3
        layers.append(("ds_conv", nn.Conv2d(self.inplanes, planes[1], kernel_size=3, stride=2, padding=1, bias=False)))
        layers.append(("ds_bn", nn.BatchNorm2d(planes[1])))
        layers.append(("ds_relu", nn.LeakyReLU(0.1)))
        # 加入殘差結(jié)構(gòu)
        self.inplanes = planes[1]
        for i in range(0, blocks):
            layers.append(("residual_{}".format(i), BasicBlock(self.inplanes, planes)))
        return nn.Sequential(OrderedDict(layers))
    def forward(self, x):
        x = self.conv1(x)
        x = self.bn1(x)
        x = self.relu1(x)
        x = self.layer1(x)
        x = self.layer2(x)
        out3 = self.layer3(x)
        out4 = self.layer4(out3)
        out5 = self.layer5(out4)
        return out3, out4, out5
def darknet53():
    model = DarkNet([1, 2, 8, 8, 4])
    return model

2、從特征獲取預(yù)測結(jié)果

請?zhí)砑訄D片描述

從特征獲取預(yù)測結(jié)果的過程可以分為兩個部分，分別是：

構(gòu)建FPN特征金字塔進(jìn)行加強(qiáng)特征提取。

利用Yolo Head對三個有效特征層進(jìn)行預(yù)測。

a、構(gòu)建FPN特征金字塔進(jìn)行加強(qiáng)特征提取

在特征利用部分，YoloV3提取多特征層進(jìn)行目標(biāo)檢測，一共提取三個特征層。三個特征層位于主干部分Darknet53的不同位置，分別位于中間層，中下層，底層，三個特征層的shape分別為(52,52,256)、(26,26,512)、(13,13,1024)。

在獲得三個有效特征層后，我們利用這三個有效特征層進(jìn)行FPN層的構(gòu)建，構(gòu)建方式為：

13x13x1024的特征層進(jìn)行5次卷積處理，處理完后利用YoloHead獲得預(yù)測結(jié)果，一部分用于進(jìn)行上采樣UmSampling2d后與26x26x512特征層進(jìn)行結(jié)合，結(jié)合特征層的shape為(26,26,768)。
結(jié)合特征層再次進(jìn)行5次卷積處理，處理完后利用YoloHead獲得預(yù)測結(jié)果，一部分用于進(jìn)行上采樣UmSampling2d后與52x52x256特征層進(jìn)行結(jié)合，結(jié)合特征層的shape為(52,52,384)。
結(jié)合特征層再次進(jìn)行5次卷積處理，處理完后利用YoloHead獲得預(yù)測結(jié)果。

特征金字塔可以將不同shape的特征層進(jìn)行特征融合，有利于提取出更好的特征。

b、利用Yolo Head獲得預(yù)測結(jié)果

利用FPN特征金字塔，我們可以獲得三個加強(qiáng)特征，這三個加強(qiáng)特征的shape分別為(13,13,512)、(26,26,256)、(52,52,128)，然后我們利用這三個shape的特征層傳入Yolo Head獲得預(yù)測結(jié)果。

Yolo Head本質(zhì)上是一次3x3卷積加上一次1x1卷積，3x3卷積的作用是特征整合，1x1卷積的作用是調(diào)整通道數(shù)。

對三個特征層分別進(jìn)行處理，假設(shè)我們預(yù)測是的VOC數(shù)據(jù)集，我們的輸出層的shape分別為(13,13,75)，(26,26,75)，(52,52,75)，最后一個維度為75是因?yàn)樵搱D是基于voc數(shù)據(jù)集的，它的類為20種，YoloV3針對每一個特征層的每一個特征點(diǎn)存在3個先驗(yàn)框，所以預(yù)測結(jié)果的通道數(shù)為3x25；如果使用的是coco訓(xùn)練集，類則為80種，最后的維度應(yīng)該為255 = 3x85，三個特征層的shape為(13,13,255)，(26,26,255)，(52,52,255)

其實(shí)際情況就是，輸入N張416x416的圖片，在經(jīng)過多層的運(yùn)算后，會輸出三個shape分別為(N,13,13,255)，(N,26,26,255)，(N,52,52,255)的數(shù)據(jù)，對應(yīng)每個圖分為13x13、26x26、52x52的網(wǎng)格上3個先驗(yàn)框的位置。

實(shí)現(xiàn)代碼如下：

from collections import OrderedDict
import torch
import torch.nn as nn
from nets.darknet import darknet53
def conv2d(filter_in, filter_out, kernel_size):
    pad = (kernel_size - 1) // 2 if kernel_size else 0
    return nn.Sequential(OrderedDict([
        ("conv", nn.Conv2d(filter_in, filter_out, kernel_size=kernel_size, stride=1, padding=pad, bias=False)),
        ("bn", nn.BatchNorm2d(filter_out)),
        ("relu", nn.LeakyReLU(0.1)),
    ]))
#------------------------------------------------------------------------#
#   make_last_layers里面一共有七個卷積，前五個用于提取特征。
#   后兩個用于獲得yolo網(wǎng)絡(luò)的預(yù)測結(jié)果
#------------------------------------------------------------------------#
def make_last_layers(filters_list, in_filters, out_filter):
    m = nn.Sequential(
        conv2d(in_filters, filters_list[0], 1),
        conv2d(filters_list[0], filters_list[1], 3),
        conv2d(filters_list[1], filters_list[0], 1),
        conv2d(filters_list[0], filters_list[1], 3),
        conv2d(filters_list[1], filters_list[0], 1),
        conv2d(filters_list[0], filters_list[1], 3),
        nn.Conv2d(filters_list[1], out_filter, kernel_size=1, stride=1, padding=0, bias=True)
    )
    return m
class YoloBody(nn.Module):
    def __init__(self, anchors_mask, num_classes):
        super(YoloBody, self).__init__()
        #---------------------------------------------------#   
        #   生成darknet53的主干模型
        #   獲得三個有效特征層，他們的shape分別是：
        #   52,52,256
        #   26,26,512
        #   13,13,1024
        #---------------------------------------------------#
        self.backbone = darknet53()
        #---------------------------------------------------#
        #   out_filters : [64, 128, 256, 512, 1024]
        #---------------------------------------------------#
        out_filters = self.backbone.layers_out_filters
        #------------------------------------------------------------------------#
        #   計(jì)算yolo_head的輸出通道數(shù)，對于voc數(shù)據(jù)集而言
        #   final_out_filter0 = final_out_filter1 = final_out_filter2 = 75
        #------------------------------------------------------------------------#
        self.last_layer0            = make_last_layers([512, 1024], out_filters[-1], len(anchors_mask[0]) * (num_classes + 5))
        self.last_layer1_conv       = conv2d(512, 256, 1)
        self.last_layer1_upsample   = nn.Upsample(scale_factor=2, mode='nearest')
        self.last_layer1            = make_last_layers([256, 512], out_filters[-2] + 256, len(anchors_mask[1]) * (num_classes + 5))
        self.last_layer2_conv       = conv2d(256, 128, 1)
        self.last_layer2_upsample   = nn.Upsample(scale_factor=2, mode='nearest')
        self.last_layer2            = make_last_layers([128, 256], out_filters[-3] + 128, len(anchors_mask[2]) * (num_classes + 5))
    def forward(self, x):
        #---------------------------------------------------#   
        #   獲得三個有效特征層，他們的shape分別是：
        #   52,52,256；26,26,512；13,13,1024
        #---------------------------------------------------#
        x2, x1, x0 = self.backbone(x)
        #---------------------------------------------------#
        #   第一個特征層
        #   out0 = (batch_size,255,13,13)
        #---------------------------------------------------#
        # 13,13,1024 -> 13,13,512 -> 13,13,1024 -> 13,13,512 -> 13,13,1024 -> 13,13,512
        out0_branch = self.last_layer0[:5](x0)
        out0        = self.last_layer0[5:](out0_branch)
        # 13,13,512 -> 13,13,256 -> 26,26,256
        x1_in = self.last_layer1_conv(out0_branch)
        x1_in = self.last_layer1_upsample(x1_in)
        # 26,26,256 + 26,26,512 -> 26,26,768
        x1_in = torch.cat([x1_in, x1], 1)
        #---------------------------------------------------#
        #   第二個特征層
        #   out1 = (batch_size,255,26,26)
        #---------------------------------------------------#
        # 26,26,768 -> 26,26,256 -> 26,26,512 -> 26,26,256 -> 26,26,512 -> 26,26,256
        out1_branch = self.last_layer1[:5](x1_in)
        out1        = self.last_layer1[5:](out1_branch)
        # 26,26,256 -> 26,26,128 -> 52,52,128
        x2_in = self.last_layer2_conv(out1_branch)
        x2_in = self.last_layer2_upsample(x2_in)
        # 52,52,128 + 52,52,256 -> 52,52,384
        x2_in = torch.cat([x2_in, x2], 1)
        #---------------------------------------------------#
        #   第一個特征層
        #   out3 = (batch_size,255,52,52)
        #---------------------------------------------------#
        # 52,52,384 -> 52,52,128 -> 52,52,256 -> 52,52,128 -> 52,52,256 -> 52,52,128
        out2 = self.last_layer2(x2_in)
        return out0, out1, out2

3、預(yù)測結(jié)果的解碼

由第二步我們可以獲得三個特征層的預(yù)測結(jié)果，shape分別為：

(N,13,13,255)
(N,26,26,255)
(N,52,52,255)

在這里我們簡單了解一下每個有效特征層到底做了什么：每一個有效特征層將整個圖片分成與其長寬對應(yīng)的網(wǎng)格，如(N,13,13,255)的特征層就是將整個圖像分成13x13個網(wǎng)格；然后從每個網(wǎng)格中心建立多個先驗(yàn)框，這些框是網(wǎng)絡(luò)預(yù)先設(shè)定好的框，網(wǎng)絡(luò)的預(yù)測結(jié)果會判斷這些框內(nèi)是否包含物體，以及這個物體的種類。

由于每一個網(wǎng)格點(diǎn)都具有三個先驗(yàn)框，所以上述的預(yù)測結(jié)果可以reshape為：

(N,13,13,3,85)
(N,26,26,3,85)
(N,52,52,3,85)

其中的85可以拆分為4+1+80，其中的4代表先驗(yàn)框的調(diào)整參數(shù)，1代表先驗(yàn)框內(nèi)是否包含物體，80代表的是這個先驗(yàn)框的種類，由于coco分了80類，所以這里是80。如果YoloV3只檢測兩類物體，那么這個85就變?yōu)榱?+1+2 = 7。

即85包含了4+1+80，分別代表x_offset、y_offset、h和w、置信度、分類結(jié)果。

但是這個預(yù)測結(jié)果并不對應(yīng)著最終的預(yù)測框在圖片上的位置，還需要解碼才可以完成。

YoloV3的解碼過程分為兩步：

先將每個網(wǎng)格點(diǎn)加上它對應(yīng)的x_offset和y_offset，加完后的結(jié)果就是預(yù)測框的中心。
然后再利用先驗(yàn)框和h、w結(jié)合計(jì)算出預(yù)測框的寬高。這樣就能得到整個預(yù)測框的位置了。

得到最終的預(yù)測結(jié)果后還要進(jìn)行得分排序與非極大抑制篩選。

這一部分基本上是所有目標(biāo)檢測通用的部分。其對于每一個類進(jìn)行判別：

1、取出每一類得分大于self.obj_threshold的框和得分。

2、利用框的位置和得分進(jìn)行非極大抑制。

實(shí)現(xiàn)代碼如下

import torch
import torch.nn as nn
from torchvision.ops import nms
import numpy as np
class DecodeBox():
    def __init__(self, anchors, num_classes, input_shape, anchors_mask = [[6,7,8], [3,4,5], [0,1,2]]):
        super(DecodeBox, self).__init__()
        self.anchors        = anchors
        self.num_classes    = num_classes
        self.bbox_attrs     = 5 + num_classes
        self.input_shape    = input_shape
        #-----------------------------------------------------------#
        #   13x13的特征層對應(yīng)的anchor是[116,90],[156,198],[373,326]
        #   26x26的特征層對應(yīng)的anchor是[30,61],[62,45],[59,119]
        #   52x52的特征層對應(yīng)的anchor是[10,13],[16,30],[33,23]
        #-----------------------------------------------------------#
        self.anchors_mask   = anchors_mask
    def decode_box(self, inputs):
        outputs = []
        for i, input in enumerate(inputs):
            #-----------------------------------------------#
            #   輸入的input一共有三個，他們的shape分別是
            #   batch_size, 255, 13, 13
            #   batch_size, 255, 26, 26
            #   batch_size, 255, 52, 52
            #-----------------------------------------------#
            batch_size      = input.size(0)
            input_height    = input.size(2)
            input_width     = input.size(3)
            #-----------------------------------------------#
            #   輸入為416x416時
            #   stride_h = stride_w = 32、16、8
            #-----------------------------------------------#
            stride_h = self.input_shape[0] / input_height
            stride_w = self.input_shape[1] / input_width
            #-------------------------------------------------#
            #   此時獲得的scaled_anchors大小是相對于特征層的
            #-------------------------------------------------#
            scaled_anchors = [(anchor_width / stride_w, anchor_height / stride_h) for anchor_width, anchor_height in self.anchors[self.anchors_mask[i]]]
            #-----------------------------------------------#
            #   輸入的input一共有三個，他們的shape分別是
            #   batch_size, 3, 13, 13, 85
            #   batch_size, 3, 26, 26, 85
            #   batch_size, 3, 52, 52, 85
            #-----------------------------------------------#
            prediction = input.view(batch_size, len(self.anchors_mask[i]),
                                    self.bbox_attrs, input_height, input_width).permute(0, 1, 3, 4, 2).contiguous()
            #-----------------------------------------------#
            #   先驗(yàn)框的中心位置的調(diào)整參數(shù)
            #-----------------------------------------------#
            x = torch.sigmoid(prediction[..., 0])  
            y = torch.sigmoid(prediction[..., 1])
            #-----------------------------------------------#
            #   先驗(yàn)框的寬高調(diào)整參數(shù)
            #-----------------------------------------------#
            w = prediction[..., 2]
            h = prediction[..., 3]
            #-----------------------------------------------#
            #   獲得置信度，是否有物體
            #-----------------------------------------------#
            conf        = torch.sigmoid(prediction[..., 4])
            #-----------------------------------------------#
            #   種類置信度
            #-----------------------------------------------#
            pred_cls    = torch.sigmoid(prediction[..., 5:])
            FloatTensor = torch.cuda.FloatTensor if x.is_cuda else torch.FloatTensor
            LongTensor  = torch.cuda.LongTensor if x.is_cuda else torch.LongTensor
            #----------------------------------------------------------#
            #   生成網(wǎng)格，先驗(yàn)框中心，網(wǎng)格左上角 
            #   batch_size,3,13,13
            #----------------------------------------------------------#
            grid_x = torch.linspace(0, input_width - 1, input_width).repeat(input_height, 1).repeat(
                batch_size * len(self.anchors_mask[i]), 1, 1).view(x.shape).type(FloatTensor)
            grid_y = torch.linspace(0, input_height - 1, input_height).repeat(input_width, 1).t().repeat(
                batch_size * len(self.anchors_mask[i]), 1, 1).view(y.shape).type(FloatTensor)
            #----------------------------------------------------------#
            #   按照網(wǎng)格格式生成先驗(yàn)框的寬高
            #   batch_size,3,13,13
            #----------------------------------------------------------#
            anchor_w = FloatTensor(scaled_anchors).index_select(1, LongTensor([0]))
            anchor_h = FloatTensor(scaled_anchors).index_select(1, LongTensor([1]))
            anchor_w = anchor_w.repeat(batch_size, 1).repeat(1, 1, input_height * input_width).view(w.shape)
            anchor_h = anchor_h.repeat(batch_size, 1).repeat(1, 1, input_height * input_width).view(h.shape)
            #----------------------------------------------------------#
            #   利用預(yù)測結(jié)果對先驗(yàn)框進(jìn)行調(diào)整
            #   首先調(diào)整先驗(yàn)框的中心，從先驗(yàn)框中心向右下角偏移
            #   再調(diào)整先驗(yàn)框的寬高。
            #----------------------------------------------------------#
            pred_boxes          = FloatTensor(prediction[..., :4].shape)
            pred_boxes[..., 0]  = x.data + grid_x
            pred_boxes[..., 1]  = y.data + grid_y
            pred_boxes[..., 2]  = torch.exp(w.data) * anchor_w
            pred_boxes[..., 3]  = torch.exp(h.data) * anchor_h
            #----------------------------------------------------------#
            #   將輸出結(jié)果歸一化成小數(shù)的形式
            #----------------------------------------------------------#
            _scale = torch.Tensor([input_width, input_height, input_width, input_height]).type(FloatTensor)
            output = torch.cat((pred_boxes.view(batch_size, -1, 4) / _scale,
                                conf.view(batch_size, -1, 1), pred_cls.view(batch_size, -1, self.num_classes)), -1)
            outputs.append(output.data)
        return outputs
    def yolo_correct_boxes(self, box_xy, box_wh, input_shape, image_shape, letterbox_image):
        #-----------------------------------------------------------------#
        #   把y軸放前面是因?yàn)榉奖泐A(yù)測框和圖像的寬高進(jìn)行相乘
        #-----------------------------------------------------------------#
        box_yx = box_xy[..., ::-1]
        box_hw = box_wh[..., ::-1]
        input_shape = np.array(input_shape)
        image_shape = np.array(image_shape)
        if letterbox_image:
            #-----------------------------------------------------------------#
            #   這里求出來的offset是圖像有效區(qū)域相對于圖像左上角的偏移情況
            #   new_shape指的是寬高縮放情況
            #-----------------------------------------------------------------#
            new_shape = np.round(image_shape * np.min(input_shape/image_shape))
            offset  = (input_shape - new_shape)/2./input_shape
            scale   = input_shape/new_shape
            box_yx  = (box_yx - offset) * scale
            box_hw *= scale
        box_mins    = box_yx - (box_hw / 2.)
        box_maxes   = box_yx + (box_hw / 2.)
        boxes  = np.concatenate([box_mins[..., 0:1], box_mins[..., 1:2], box_maxes[..., 0:1], box_maxes[..., 1:2]], axis=-1)
        boxes *= np.concatenate([image_shape, image_shape], axis=-1)
        return boxes
    def non_max_suppression(self, prediction, num_classes, input_shape, image_shape, letterbox_image, conf_thres=0.5, nms_thres=0.4):
        #----------------------------------------------------------#
        #   將預(yù)測結(jié)果的格式轉(zhuǎn)換成左上角右下角的格式。
        #   prediction  [batch_size, num_anchors, 85]
        #----------------------------------------------------------#
        box_corner          = prediction.new(prediction.shape)
        box_corner[:, :, 0] = prediction[:, :, 0] - prediction[:, :, 2] / 2
        box_corner[:, :, 1] = prediction[:, :, 1] - prediction[:, :, 3] / 2
        box_corner[:, :, 2] = prediction[:, :, 0] + prediction[:, :, 2] / 2
        box_corner[:, :, 3] = prediction[:, :, 1] + prediction[:, :, 3] / 2
        prediction[:, :, :4] = box_corner[:, :, :4]
        output = [None for _ in range(len(prediction))]
        for i, image_pred in enumerate(prediction):
            #----------------------------------------------------------#
            #   對種類預(yù)測部分取max。
            #   class_conf  [num_anchors, 1]    種類置信度
            #   class_pred  [num_anchors, 1]    種類
            #----------------------------------------------------------#
            class_conf, class_pred = torch.max(image_pred[:, 5:5 + num_classes], 1, keepdim=True)
            #----------------------------------------------------------#
            #   利用置信度進(jìn)行第一輪篩選
            #----------------------------------------------------------#
            conf_mask = (image_pred[:, 4] * class_conf[:, 0] >= conf_thres).squeeze()
            #----------------------------------------------------------#
            #   根據(jù)置信度進(jìn)行預(yù)測結(jié)果的篩選
            #----------------------------------------------------------#
            image_pred = image_pred[conf_mask]
            class_conf = class_conf[conf_mask]
            class_pred = class_pred[conf_mask]
            if not image_pred.size(0):
                continue
            #-------------------------------------------------------------------------#
            #   detections  [num_anchors, 7]
            #   7的內(nèi)容為：x1, y1, x2, y2, obj_conf, class_conf, class_pred
            #-------------------------------------------------------------------------#
            detections = torch.cat((image_pred[:, :5], class_conf.float(), class_pred.float()), 1)
            #------------------------------------------#
            #   獲得預(yù)測結(jié)果中包含的所有種類
            #------------------------------------------#
            unique_labels = detections[:, -1].cpu().unique()
            if prediction.is_cuda:
                unique_labels = unique_labels.cuda()
                detections = detections.cuda()
            for c in unique_labels:
                #------------------------------------------#
                #   獲得某一類得分篩選后全部的預(yù)測結(jié)果
                #------------------------------------------#
                detections_class = detections[detections[:, -1] == c]
                #------------------------------------------#
                #   使用官方自帶的非極大抑制會速度更快一些！
                #------------------------------------------#
                keep = nms(
                    detections_class[:, :4],
                    detections_class[:, 4] * detections_class[:, 5],
                    nms_thres
                )
                max_detections = detections_class[keep]
                # # 按照存在物體的置信度排序
                # _, conf_sort_index = torch.sort(detections_class[:, 4]*detections_class[:, 5], descending=True)
                # detections_class = detections_class[conf_sort_index]
                # # 進(jìn)行非極大抑制
                # max_detections = []
                # while detections_class.size(0):
                #     # 取出這一類置信度最高的，一步一步往下判斷，判斷重合程度是否大于nms_thres，如果是則去除掉
                #     max_detections.append(detections_class[0].unsqueeze(0))
                #     if len(detections_class) == 1:
                #         break
                #     ious = bbox_iou(max_detections[-1], detections_class[1:])
                #     detections_class = detections_class[1:][ious < nms_thres]
                # # 堆疊
                # max_detections = torch.cat(max_detections).data
                # Add max detections to outputs
                output[i] = max_detections if output[i] is None else torch.cat((output[i], max_detections))
            if output[i] is not None:
                output[i]           = output[i].cpu().numpy()
                box_xy, box_wh      = (output[i][:, 0:2] + output[i][:, 2:4])/2, output[i][:, 2:4] - output[i][:, 0:2]
                output[i][:, :4]    = self.yolo_correct_boxes(box_xy, box_wh, input_shape, image_shape, letterbox_image)
        return output

4、在原圖上進(jìn)行繪制

通過第三步，我們可以獲得預(yù)測框在原圖上的位置，而且這些預(yù)測框都是經(jīng)過篩選的。

這些篩選后的框可以直接繪制在圖片上，就可以獲得結(jié)果了。

二、訓(xùn)練部分

1、計(jì)算loss所需參數(shù)

在計(jì)算loss的時候，實(shí)際上是pred和target之間的對比：

pred就是網(wǎng)絡(luò)的預(yù)測結(jié)果。

target就是網(wǎng)絡(luò)的真實(shí)框情況。

2、pred是什么

對于yolo3的模型來說，網(wǎng)絡(luò)最后輸出的內(nèi)容就是三個特征層每個網(wǎng)格點(diǎn)對應(yīng)的預(yù)測框及其種類，即三個特征層分別對應(yīng)著圖片被分為不同size的網(wǎng)格后，每個網(wǎng)格點(diǎn)上三個先驗(yàn)框?qū)?yīng)的位置、置信度及其種類。

輸出層的shape分別為(13,13,75)，(26,26,75)，(52,52,75)，最后一個維度為75是因?yàn)槭腔趘oc數(shù)據(jù)集的，它的類為20種，yolo3只有針對每一個特征層存在3個先驗(yàn)框，所以最后維度為3x25；

如果使用的是coco訓(xùn)練集，類則為80種，最后的維度應(yīng)該為255 = 3x85，三個特征層的shape為(13,13,255)，(26,26,255)，(52,52,255)

現(xiàn)在的y_pre還是沒有解碼的，解碼了之后才是真實(shí)圖像上的情況。

3、target是什么。

target就是一個真實(shí)圖像中，真實(shí)框的情況。第一個維度是batch_size，第二個維度是每一張圖片里面真實(shí)框的數(shù)量，第三個維度內(nèi)部是真實(shí)框的信息，包括位置以及種類。

4、loss的計(jì)算過程

拿到pred和target后，不可以簡單的減一下作為對比，需要進(jìn)行如下步驟。

判斷真實(shí)框在圖片中的位置，判斷其屬于哪一個網(wǎng)格點(diǎn)去檢測。判斷真實(shí)框和這個特征點(diǎn)的哪個先驗(yàn)框重合程度最高。計(jì)算該網(wǎng)格點(diǎn)應(yīng)該有怎么樣的預(yù)測結(jié)果才能獲得真實(shí)框，與真實(shí)框重合度最高的先驗(yàn)框被用于作為正樣本。
根據(jù)網(wǎng)絡(luò)的預(yù)測結(jié)果獲得預(yù)測框，計(jì)算預(yù)測框和所有真實(shí)框的重合程度，如果重合程度大于一定門限，則將該預(yù)測框?qū)?yīng)的先驗(yàn)框忽略。其余作為負(fù)樣本。
最終損失由三個部分組成：a、正樣本，編碼后的長寬與xy軸偏移量與預(yù)測值的差距。b、正樣本，預(yù)測結(jié)果中置信度的值與1對比；負(fù)樣本，預(yù)測結(jié)果中置信度的值與0對比。c、實(shí)際存在的框，種類預(yù)測結(jié)果與實(shí)際結(jié)果的對比。

import torch
import torch.nn as nn
import math
import numpy as np
class YOLOLoss(nn.Module):
    def __init__(self, anchors, num_classes, input_shape, cuda, anchors_mask = [[6,7,8], [3,4,5], [0,1,2]]):
        super(YOLOLoss, self).__init__()
        #-----------------------------------------------------------#
        #   13x13的特征層對應(yīng)的anchor是[116,90],[156,198],[373,326]
        #   26x26的特征層對應(yīng)的anchor是[30,61],[62,45],[59,119]
        #   52x52的特征層對應(yīng)的anchor是[10,13],[16,30],[33,23]
        #-----------------------------------------------------------#
        self.anchors        = anchors
        self.num_classes    = num_classes
        self.bbox_attrs     = 5 + num_classes
        self.input_shape    = input_shape
        self.anchors_mask   = anchors_mask
        self.ignore_threshold = 0.7
        self.cuda = cuda
    def clip_by_tensor(self, t, t_min, t_max):
        t = t.float()
        result = (t >= t_min).float() * t + (t < t_min).float() * t_min
        result = (result <= t_max).float() * result + (result > t_max).float() * t_max
        return result
    def MSELoss(self, pred, target):
        return torch.pow(pred - target, 2)
    def BCELoss(self, pred, target):
        epsilon = 1e-7
        pred    = self.clip_by_tensor(pred, epsilon, 1.0 - epsilon)
        output  = - target * torch.log(pred) - (1.0 - target) * torch.log(1.0 - pred)
        return output
    def forward(self, l, input, targets=None):
        #----------------------------------------------------#
        #   l代表的是，當(dāng)前輸入進(jìn)來的有效特征層，是第幾個有效特征層
        #   input的shape為  bs, 3*(5+num_classes), 13, 13
        #                   bs, 3*(5+num_classes), 26, 26
        #                   bs, 3*(5+num_classes), 52, 52
        #   targets代表的是真實(shí)框。
        #----------------------------------------------------#
        #--------------------------------#
        #   獲得圖片數(shù)量，特征層的高和寬
        #   13和13
        #--------------------------------#
        bs      = input.size(0)
        in_h    = input.size(2)
        in_w    = input.size(3)
        #-----------------------------------------------------------------------#
        #   計(jì)算步長
        #   每一個特征點(diǎn)對應(yīng)原來的圖片上多少個像素點(diǎn)
        #   如果特征層為13x13的話，一個特征點(diǎn)就對應(yīng)原來的圖片上的32個像素點(diǎn)
        #   如果特征層為26x26的話，一個特征點(diǎn)就對應(yīng)原來的圖片上的16個像素點(diǎn)
        #   如果特征層為52x52的話，一個特征點(diǎn)就對應(yīng)原來的圖片上的8個像素點(diǎn)
        #   stride_h = stride_w = 32、16、8
        #   stride_h和stride_w都是32。
        #-----------------------------------------------------------------------#
        stride_h = self.input_shape[0] / in_h
        stride_w = self.input_shape[1] / in_w
        #-------------------------------------------------#
        #   此時獲得的scaled_anchors大小是相對于特征層的
        #-------------------------------------------------#
        scaled_anchors  = [(a_w / stride_w, a_h / stride_h) for a_w, a_h in self.anchors]
        #-----------------------------------------------#
        #   輸入的input一共有三個，他們的shape分別是
        #   bs, 3*(5+num_classes), 13, 13 => batch_size, 3, 13, 13, 5 + num_classes
        #   batch_size, 3, 26, 26, 5 + num_classes
        #   batch_size, 3, 52, 52, 5 + num_classes
        #-----------------------------------------------#
        prediction = input.view(bs, len(self.anchors_mask[l]), self.bbox_attrs, in_h, in_w).permute(0, 1, 3, 4, 2).contiguous()
        #-----------------------------------------------#
        #   先驗(yàn)框的中心位置的調(diào)整參數(shù)
        #-----------------------------------------------#
        x = torch.sigmoid(prediction[..., 0])
        y = torch.sigmoid(prediction[..., 1])
        #-----------------------------------------------#
        #   先驗(yàn)框的寬高調(diào)整參數(shù)
        #-----------------------------------------------#
        w = prediction[..., 2]
        h = prediction[..., 3]
        #-----------------------------------------------#
        #   獲得置信度，是否有物體
        #-----------------------------------------------#
        conf = torch.sigmoid(prediction[..., 4])
        #-----------------------------------------------#
        #   種類置信度
        #-----------------------------------------------#
        pred_cls = torch.sigmoid(prediction[..., 5:])
        #-----------------------------------------------#
        #   獲得網(wǎng)絡(luò)應(yīng)該有的預(yù)測結(jié)果
        #-----------------------------------------------#
        y_true, noobj_mask, box_loss_scale = self.get_target(l, targets, scaled_anchors, in_h, in_w)
        #---------------------------------------------------------------#
        #   將預(yù)測結(jié)果進(jìn)行解碼，判斷預(yù)測結(jié)果和真實(shí)值的重合程度
        #   如果重合程度過大則忽略，因?yàn)檫@些特征點(diǎn)屬于預(yù)測比較準(zhǔn)確的特征點(diǎn)
        #   作為負(fù)樣本不合適
        #----------------------------------------------------------------#
        noobj_mask = self.get_ignore(l, x, y, h, w, targets, scaled_anchors, in_h, in_w, noobj_mask)
        if self.cuda:
            y_true          = y_true.cuda()
            noobj_mask      = noobj_mask.cuda()
            box_loss_scale  = box_loss_scale.cuda()
        #-----------------------------------------------------------#
        #   reshape_y_true[...,2:3]和reshape_y_true[...,3:4]
        #   表示真實(shí)框的寬高，二者均在0-1之間
        #   真實(shí)框越大，比重越小，小框的比重更大。
        #-----------------------------------------------------------#
        box_loss_scale = 2 - box_loss_scale
        #-----------------------------------------------------------#
        #   計(jì)算中心偏移情況的loss，使用BCELoss效果好一些
        #-----------------------------------------------------------#
        loss_x = torch.sum(self.BCELoss(x, y_true[..., 0]) * box_loss_scale * y_true[..., 4])
        loss_y = torch.sum(self.BCELoss(y, y_true[..., 1]) * box_loss_scale * y_true[..., 4])
        #-----------------------------------------------------------#
        #   計(jì)算寬高調(diào)整值的loss
        #-----------------------------------------------------------#
        loss_w = torch.sum(self.MSELoss(w, y_true[..., 2]) * 0.5 * box_loss_scale * y_true[..., 4])
        loss_h = torch.sum(self.MSELoss(h, y_true[..., 3]) * 0.5 * box_loss_scale * y_true[..., 4])
        #-----------------------------------------------------------#
        #   計(jì)算置信度的loss
        #-----------------------------------------------------------#
        loss_conf   = torch.sum(self.BCELoss(conf, y_true[..., 4]) * y_true[..., 4]) + \
                      torch.sum(self.BCELoss(conf, y_true[..., 4]) * noobj_mask)
        loss_cls    = torch.sum(self.BCELoss(pred_cls[y_true[..., 4] == 1], y_true[..., 5:][y_true[..., 4] == 1]))
        loss        = loss_x  + loss_y + loss_w + loss_h + loss_conf + loss_cls
        num_pos = torch.sum(y_true[..., 4])
        num_pos = torch.max(num_pos, torch.ones_like(num_pos))
        return loss, num_pos
    def calculate_iou(self, _box_a, _box_b):
        #-----------------------------------------------------------#
        #   計(jì)算真實(shí)框的左上角和右下角
        #-----------------------------------------------------------#
        b1_x1, b1_x2 = _box_a[:, 0] - _box_a[:, 2] / 2, _box_a[:, 0] + _box_a[:, 2] / 2
        b1_y1, b1_y2 = _box_a[:, 1] - _box_a[:, 3] / 2, _box_a[:, 1] + _box_a[:, 3] / 2
        #-----------------------------------------------------------#
        #   計(jì)算先驗(yàn)框獲得的預(yù)測框的左上角和右下角
        #-----------------------------------------------------------#
        b2_x1, b2_x2 = _box_b[:, 0] - _box_b[:, 2] / 2, _box_b[:, 0] + _box_b[:, 2] / 2
        b2_y1, b2_y2 = _box_b[:, 1] - _box_b[:, 3] / 2, _box_b[:, 1] + _box_b[:, 3] / 2
        #-----------------------------------------------------------#
        #   將真實(shí)框和預(yù)測框都轉(zhuǎn)化成左上角右下角的形式
        #-----------------------------------------------------------#
        box_a = torch.zeros_like(_box_a)
        box_b = torch.zeros_like(_box_b)
        box_a[:, 0], box_a[:, 1], box_a[:, 2], box_a[:, 3] = b1_x1, b1_y1, b1_x2, b1_y2
        box_b[:, 0], box_b[:, 1], box_b[:, 2], box_b[:, 3] = b2_x1, b2_y1, b2_x2, b2_y2
        #-----------------------------------------------------------#
        #   A為真實(shí)框的數(shù)量，B為先驗(yàn)框的數(shù)量
        #-----------------------------------------------------------#
        A = box_a.size(0)
        B = box_b.size(0)
        #-----------------------------------------------------------#
        #   計(jì)算交的面積
        #-----------------------------------------------------------#
        max_xy  = torch.min(box_a[:, 2:].unsqueeze(1).expand(A, B, 2), box_b[:, 2:].unsqueeze(0).expand(A, B, 2))
        min_xy  = torch.max(box_a[:, :2].unsqueeze(1).expand(A, B, 2), box_b[:, :2].unsqueeze(0).expand(A, B, 2))
        inter   = torch.clamp((max_xy - min_xy), min=0)
        inter   = inter[:, :, 0] * inter[:, :, 1]
        #-----------------------------------------------------------#
        #   計(jì)算預(yù)測框和真實(shí)框各自的面積
        #-----------------------------------------------------------#
        area_a = ((box_a[:, 2]-box_a[:, 0]) * (box_a[:, 3]-box_a[:, 1])).unsqueeze(1).expand_as(inter)  # [A,B]
        area_b = ((box_b[:, 2]-box_b[:, 0]) * (box_b[:, 3]-box_b[:, 1])).unsqueeze(0).expand_as(inter)  # [A,B]
        #-----------------------------------------------------------#
        #   求IOU
        #-----------------------------------------------------------#
        union = area_a + area_b - inter
        return inter / union  # [A,B]
    def get_target(self, l, targets, anchors, in_h, in_w):
        #-----------------------------------------------------#
        #   計(jì)算一共有多少張圖片
        #-----------------------------------------------------#
        bs              = len(targets)
        #-----------------------------------------------------#
        #   用于選取哪些先驗(yàn)框不包含物體
        #-----------------------------------------------------#
        noobj_mask      = torch.ones(bs, len(self.anchors_mask[l]), in_h, in_w, requires_grad = False)
        #-----------------------------------------------------#
        #   讓網(wǎng)絡(luò)更加去關(guān)注小目標(biāo)
        #-----------------------------------------------------#
        box_loss_scale  = torch.zeros(bs, len(self.anchors_mask[l]), in_h, in_w, requires_grad = False)
        #-----------------------------------------------------#
        #   batch_size, 3, 13, 13, 5 + num_classes
        #-----------------------------------------------------#
        y_true          = torch.zeros(bs, len(self.anchors_mask[l]), in_h, in_w, self.bbox_attrs, requires_grad = False)
        for b in range(bs):            
            if len(targets[b])==0:
                continue
            batch_target = torch.zeros_like(targets[b])
            #-------------------------------------------------------#
            #   計(jì)算出正樣本在特征層上的中心點(diǎn)
            #-------------------------------------------------------#
            batch_target[:, [0,2]] = targets[b][:, [0,2]] * in_w
            batch_target[:, [1,3]] = targets[b][:, [1,3]] * in_h
            batch_target[:, 4] = targets[b][:, 4]
            batch_target = batch_target.cpu()
            #-------------------------------------------------------#
            #   將真實(shí)框轉(zhuǎn)換一個形式
            #   num_true_box, 4
            #-------------------------------------------------------#
            gt_box          = torch.FloatTensor(torch.cat((torch.zeros((batch_target.size(0), 2)), batch_target[:, 2:4]), 1))
            #-------------------------------------------------------#
            #   將先驗(yàn)框轉(zhuǎn)換一個形式
            #   9, 4
            #-------------------------------------------------------#
            anchor_shapes   = torch.FloatTensor(torch.cat((torch.zeros((len(anchors), 2)), torch.FloatTensor(anchors)), 1))
            #-------------------------------------------------------#
            #   計(jì)算交并比
            #   self.calculate_iou(gt_box, anchor_shapes) = [num_true_box, 9]每一個真實(shí)框和9個先驗(yàn)框的重合情況
            #   best_ns:
            #   [每個真實(shí)框最大的重合度max_iou, 每一個真實(shí)框最重合的先驗(yàn)框的序號]
            #-------------------------------------------------------#
            best_ns = torch.argmax(self.calculate_iou(gt_box, anchor_shapes), dim=-1)
            for t, best_n in enumerate(best_ns):
                if best_n not in self.anchors_mask[l]:
                    continue
                #----------------------------------------#
                #   判斷這個先驗(yàn)框是當(dāng)前特征點(diǎn)的哪一個先驗(yàn)框
                #----------------------------------------#
                k = self.anchors_mask[l].index(best_n)
                #----------------------------------------#
                #   獲得真實(shí)框?qū)儆谀膫€網(wǎng)格點(diǎn)
                #----------------------------------------#
                i = torch.floor(batch_target[t, 0]).long()
                j = torch.floor(batch_target[t, 1]).long()
                #----------------------------------------#
                #   取出真實(shí)框的種類
                #----------------------------------------#
                c = batch_target[t, 4].long()
                #----------------------------------------#
                #   noobj_mask代表無目標(biāo)的特征點(diǎn)
                #----------------------------------------#
                noobj_mask[b, k, j, i] = 0
                #----------------------------------------#
                #   tx、ty代表中心調(diào)整參數(shù)的真實(shí)值
                #----------------------------------------#
                y_true[b, k, j, i, 0] = batch_target[t, 0] - i.float()
                y_true[b, k, j, i, 1] = batch_target[t, 1] - j.float()
                y_true[b, k, j, i, 2] = math.log(batch_target[t, 2] / anchors[best_n][0])
                y_true[b, k, j, i, 3] = math.log(batch_target[t, 3] / anchors[best_n][1])
                y_true[b, k, j, i, 4] = 1
                y_true[b, k, j, i, c + 5] = 1
                #----------------------------------------#
                #   用于獲得xywh的比例
                #   大目標(biāo)loss權(quán)重小，小目標(biāo)loss權(quán)重大
                #----------------------------------------#
                box_loss_scale[b, k, j, i] = batch_target[t, 2] * batch_target[t, 3] / in_w / in_h
        return y_true, noobj_mask, box_loss_scale
    def get_ignore(self, l, x, y, h, w, targets, scaled_anchors, in_h, in_w, noobj_mask):
        #-----------------------------------------------------#
        #   計(jì)算一共有多少張圖片
        #-----------------------------------------------------#
        bs = len(targets)
        FloatTensor = torch.cuda.FloatTensor if x.is_cuda else torch.FloatTensor
        LongTensor  = torch.cuda.LongTensor if x.is_cuda else torch.LongTensor
        #-----------------------------------------------------#
        #   生成網(wǎng)格，先驗(yàn)框中心，網(wǎng)格左上角
        #-----------------------------------------------------#
        grid_x = torch.linspace(0, in_w - 1, in_w).repeat(in_h, 1).repeat(
            int(bs * len(self.anchors_mask[l])), 1, 1).view(x.shape).type(FloatTensor)
        grid_y = torch.linspace(0, in_h - 1, in_h).repeat(in_w, 1).t().repeat(
            int(bs * len(self.anchors_mask[l])), 1, 1).view(y.shape).type(FloatTensor)
        # 生成先驗(yàn)框的寬高
        scaled_anchors_l = np.array(scaled_anchors)[self.anchors_mask[l]]
        anchor_w = FloatTensor(scaled_anchors_l).index_select(1, LongTensor([0]))
        anchor_h = FloatTensor(scaled_anchors_l).index_select(1, LongTensor([1]))
        anchor_w = anchor_w.repeat(bs, 1).repeat(1, 1, in_h * in_w).view(w.shape)
        anchor_h = anchor_h.repeat(bs, 1).repeat(1, 1, in_h * in_w).view(h.shape)
        #-------------------------------------------------------#
        #   計(jì)算調(diào)整后的先驗(yàn)框中心與寬高
        #-------------------------------------------------------#
        pred_boxes_x    = torch.unsqueeze(x.data + grid_x, -1)
        pred_boxes_y    = torch.unsqueeze(y.data + grid_y, -1)
        pred_boxes_w    = torch.unsqueeze(torch.exp(w.data) * anchor_w, -1)
        pred_boxes_h    = torch.unsqueeze(torch.exp(h.data) * anchor_h, -1)
        pred_boxes      = torch.cat([pred_boxes_x, pred_boxes_y, pred_boxes_w, pred_boxes_h], dim = -1)
        for b in range(bs):           
            #-------------------------------------------------------#
            #   將預(yù)測結(jié)果轉(zhuǎn)換一個形式
            #   pred_boxes_for_ignore      num_anchors, 4
            #-------------------------------------------------------#
            pred_boxes_for_ignore = pred_boxes[b].view(-1, 4)
            #-------------------------------------------------------#
            #   計(jì)算真實(shí)框，并把真實(shí)框轉(zhuǎn)換成相對于特征層的大小
            #   gt_box      num_true_box, 4
            #-------------------------------------------------------#
            if len(targets[b]) > 0:
                batch_target = torch.zeros_like(targets[b])
                #-------------------------------------------------------#
                #   計(jì)算出正樣本在特征層上的中心點(diǎn)
                #-------------------------------------------------------#
                batch_target[:, [0,2]] = targets[b][:, [0,2]] * in_w
                batch_target[:, [1,3]] = targets[b][:, [1,3]] * in_h
                batch_target = batch_target[:, :4]
                #-------------------------------------------------------#
                #   計(jì)算交并比
                #   anch_ious       num_true_box, num_anchors
                #-------------------------------------------------------#
                anch_ious = self.calculate_iou(batch_target, pred_boxes_for_ignore)
                #-------------------------------------------------------#
                #   每個先驗(yàn)框?qū)?yīng)真實(shí)框的最大重合度
                #   anch_ious_max   num_anchors
                #-------------------------------------------------------#
                anch_ious_max, _    = torch.max(anch_ious, dim = 0)
                anch_ious_max       = anch_ious_max.view(pred_boxes[b].size()[:3])
                noobj_mask[b][anch_ious_max > self.ignore_threshold] = 0
        return noobj_mask

訓(xùn)練自己的YoloV3模型

首先前往Github下載對應(yīng)的倉庫，下載完后利用解壓軟件解壓，之后用編程軟件打開文件夾。注意打開的根目錄必須正確，否則相對目錄不正確的情況下，代碼將無法運(yùn)行。一定要注意打開后的根目錄是文件存放的目錄。

一、數(shù)據(jù)集的準(zhǔn)備

本文使用VOC格式進(jìn)行訓(xùn)練，訓(xùn)練前需要自己制作好數(shù)據(jù)集，如果沒有自己的數(shù)據(jù)集，可以通過Github連接下載VOC12+07的數(shù)據(jù)集嘗試下。

訓(xùn)練前將標(biāo)簽文件放在VOCdevkit文件夾下的VOC2007文件夾下的Annotation中。

訓(xùn)練前將圖片文件放在VOCdevkit文件夾下的VOC2007文件夾下的JPEGImages中。

此時數(shù)據(jù)集的擺放已經(jīng)結(jié)束。

二、數(shù)據(jù)集的處理

在完成數(shù)據(jù)集的擺放之后，我們需要對數(shù)據(jù)集進(jìn)行下一步的處理，目的是獲得訓(xùn)練用的2007_train.txt以及2007_val.txt，需要用到根目錄下的voc_annotation.py。

voc_annotation.py里面有一些參數(shù)需要設(shè)置。

分別是annotation_mode、classes_path、trainval_percent、train_percent、VOCdevkit_path，第一次訓(xùn)練可以僅修改classes_path

import torch
import torch.nn as nn
import math
import numpy as np
class YOLOLoss(nn.Module):
    def __init__(self, anchors, num_classes, input_shape, cuda, anchors_mask = [[6,7,8], [3,4,5], [0,1,2]]):
        super(YOLOLoss, self).__init__()
        #-----------------------------------------------------------#
        #   13x13的特征層對應(yīng)的anchor是[116,90],[156,198],[373,326]
        #   26x26的特征層對應(yīng)的anchor是[30,61],[62,45],[59,119]
        #   52x52的特征層對應(yīng)的anchor是[10,13],[16,30],[33,23]
        #-----------------------------------------------------------#
        self.anchors        = anchors
        self.num_classes    = num_classes
        self.bbox_attrs     = 5 + num_classes
        self.input_shape    = input_shape
        self.anchors_mask   = anchors_mask
        self.ignore_threshold = 0.7
        self.cuda = cuda
    def clip_by_tensor(self, t, t_min, t_max):
        t = t.float()
        result = (t >= t_min).float() * t + (t < t_min).float() * t_min
        result = (result <= t_max).float() * result + (result > t_max).float() * t_max
        return result
    def MSELoss(self, pred, target):
        return torch.pow(pred - target, 2)
    def BCELoss(self, pred, target):
        epsilon = 1e-7
        pred    = self.clip_by_tensor(pred, epsilon, 1.0 - epsilon)
        output  = - target * torch.log(pred) - (1.0 - target) * torch.log(1.0 - pred)
        return output
    def forward(self, l, input, targets=None):
        #----------------------------------------------------#
        #   l代表的是，當(dāng)前輸入進(jìn)來的有效特征層，是第幾個有效特征層
        #   input的shape為  bs, 3*(5+num_classes), 13, 13
        #                   bs, 3*(5+num_classes), 26, 26
        #                   bs, 3*(5+num_classes), 52, 52
        #   targets代表的是真實(shí)框。
        #----------------------------------------------------#
        #--------------------------------#
        #   獲得圖片數(shù)量，特征層的高和寬
        #   13和13
        #--------------------------------#
        bs      = input.size(0)
        in_h    = input.size(2)
        in_w    = input.size(3)
        #-----------------------------------------------------------------------#
        #   計(jì)算步長
        #   每一個特征點(diǎn)對應(yīng)原來的圖片上多少個像素點(diǎn)
        #   如果特征層為13x13的話，一個特征點(diǎn)就對應(yīng)原來的圖片上的32個像素點(diǎn)
        #   如果特征層為26x26的話，一個特征點(diǎn)就對應(yīng)原來的圖片上的16個像素點(diǎn)
        #   如果特征層為52x52的話，一個特征點(diǎn)就對應(yīng)原來的圖片上的8個像素點(diǎn)
        #   stride_h = stride_w = 32、16、8
        #   stride_h和stride_w都是32。
        #-----------------------------------------------------------------------#
        stride_h = self.input_shape[0] / in_h
        stride_w = self.input_shape[1] / in_w
        #-------------------------------------------------#
        #   此時獲得的scaled_anchors大小是相對于特征層的
        #-------------------------------------------------#
        scaled_anchors  = [(a_w / stride_w, a_h / stride_h) for a_w, a_h in self.anchors]
        #-----------------------------------------------#
        #   輸入的input一共有三個，他們的shape分別是
        #   bs, 3*(5+num_classes), 13, 13 => batch_size, 3, 13, 13, 5 + num_classes
        #   batch_size, 3, 26, 26, 5 + num_classes
        #   batch_size, 3, 52, 52, 5 + num_classes
        #-----------------------------------------------#
        prediction = input.view(bs, len(self.anchors_mask[l]), self.bbox_attrs, in_h, in_w).permute(0, 1, 3, 4, 2).contiguous()
        #-----------------------------------------------#
        #   先驗(yàn)框的中心位置的調(diào)整參數(shù)
        #-----------------------------------------------#
        x = torch.sigmoid(prediction[..., 0])
        y = torch.sigmoid(prediction[..., 1])
        #-----------------------------------------------#
        #   先驗(yàn)框的寬高調(diào)整參數(shù)
        #-----------------------------------------------#
        w = prediction[..., 2]
        h = prediction[..., 3]
        #-----------------------------------------------#
        #   獲得置信度，是否有物體
        #-----------------------------------------------#
        conf = torch.sigmoid(prediction[..., 4])
        #-----------------------------------------------#
        #   種類置信度
        #-----------------------------------------------#
        pred_cls = torch.sigmoid(prediction[..., 5:])
        #-----------------------------------------------#
        #   獲得網(wǎng)絡(luò)應(yīng)該有的預(yù)測結(jié)果
        #-----------------------------------------------#
        y_true, noobj_mask, box_loss_scale = self.get_target(l, targets, scaled_anchors, in_h, in_w)
        #---------------------------------------------------------------#
        #   將預(yù)測結(jié)果進(jìn)行解碼，判斷預(yù)測結(jié)果和真實(shí)值的重合程度
        #   如果重合程度過大則忽略，因?yàn)檫@些特征點(diǎn)屬于預(yù)測比較準(zhǔn)確的特征點(diǎn)
        #   作為負(fù)樣本不合適
        #----------------------------------------------------------------#
        noobj_mask = self.get_ignore(l, x, y, h, w, targets, scaled_anchors, in_h, in_w, noobj_mask)
        if self.cuda:
            y_true          = y_true.cuda()
            noobj_mask      = noobj_mask.cuda()
            box_loss_scale  = box_loss_scale.cuda()
        #-----------------------------------------------------------#
        #   reshape_y_true[...,2:3]和reshape_y_true[...,3:4]
        #   表示真實(shí)框的寬高，二者均在0-1之間
        #   真實(shí)框越大，比重越小，小框的比重更大。
        #-----------------------------------------------------------#
        box_loss_scale = 2 - box_loss_scale
        #-----------------------------------------------------------#
        #   計(jì)算中心偏移情況的loss，使用BCELoss效果好一些
        #-----------------------------------------------------------#
        loss_x = torch.sum(self.BCELoss(x, y_true[..., 0]) * box_loss_scale * y_true[..., 4])
        loss_y = torch.sum(self.BCELoss(y, y_true[..., 1]) * box_loss_scale * y_true[..., 4])
        #-----------------------------------------------------------#
        #   計(jì)算寬高調(diào)整值的loss
        #-----------------------------------------------------------#
        loss_w = torch.sum(self.MSELoss(w, y_true[..., 2]) * 0.5 * box_loss_scale * y_true[..., 4])
        loss_h = torch.sum(self.MSELoss(h, y_true[..., 3]) * 0.5 * box_loss_scale * y_true[..., 4])
        #-----------------------------------------------------------#
        #   計(jì)算置信度的loss
        #-----------------------------------------------------------#
        loss_conf   = torch.sum(self.BCELoss(conf, y_true[..., 4]) * y_true[..., 4]) + \
                      torch.sum(self.BCELoss(conf, y_true[..., 4]) * noobj_mask)
        loss_cls    = torch.sum(self.BCELoss(pred_cls[y_true[..., 4] == 1], y_true[..., 5:][y_true[..., 4] == 1]))
        loss        = loss_x  + loss_y + loss_w + loss_h + loss_conf + loss_cls
        num_pos = torch.sum(y_true[..., 4])
        num_pos = torch.max(num_pos, torch.ones_like(num_pos))
        return loss, num_pos
    def calculate_iou(self, _box_a, _box_b):
        #-----------------------------------------------------------#
        #   計(jì)算真實(shí)框的左上角和右下角
        #-----------------------------------------------------------#
        b1_x1, b1_x2 = _box_a[:, 0] - _box_a[:, 2] / 2, _box_a[:, 0] + _box_a[:, 2] / 2
        b1_y1, b1_y2 = _box_a[:, 1] - _box_a[:, 3] / 2, _box_a[:, 1] + _box_a[:, 3] / 2
        #-----------------------------------------------------------#
        #   計(jì)算先驗(yàn)框獲得的預(yù)測框的左上角和右下角
        #-----------------------------------------------------------#
        b2_x1, b2_x2 = _box_b[:, 0] - _box_b[:, 2] / 2, _box_b[:, 0] + _box_b[:, 2] / 2
        b2_y1, b2_y2 = _box_b[:, 1] - _box_b[:, 3] / 2, _box_b[:, 1] + _box_b[:, 3] / 2
        #-----------------------------------------------------------#
        #   將真實(shí)框和預(yù)測框都轉(zhuǎn)化成左上角右下角的形式
        #-----------------------------------------------------------#
        box_a = torch.zeros_like(_box_a)
        box_b = torch.zeros_like(_box_b)
        box_a[:, 0], box_a[:, 1], box_a[:, 2], box_a[:, 3] = b1_x1, b1_y1, b1_x2, b1_y2
        box_b[:, 0], box_b[:, 1], box_b[:, 2], box_b[:, 3] = b2_x1, b2_y1, b2_x2, b2_y2
        #-----------------------------------------------------------#
        #   A為真實(shí)框的數(shù)量，B為先驗(yàn)框的數(shù)量
        #-----------------------------------------------------------#
        A = box_a.size(0)
        B = box_b.size(0)
        #-----------------------------------------------------------#
        #   計(jì)算交的面積
        #-----------------------------------------------------------#
        max_xy  = torch.min(box_a[:, 2:].unsqueeze(1).expand(A, B, 2), box_b[:, 2:].unsqueeze(0).expand(A, B, 2))
        min_xy  = torch.max(box_a[:, :2].unsqueeze(1).expand(A, B, 2), box_b[:, :2].unsqueeze(0).expand(A, B, 2))
        inter   = torch.clamp((max_xy - min_xy), min=0)
        inter   = inter[:, :, 0] * inter[:, :, 1]
        #-----------------------------------------------------------#
        #   計(jì)算預(yù)測框和真實(shí)框各自的面積
        #-----------------------------------------------------------#
        area_a = ((box_a[:, 2]-box_a[:, 0]) * (box_a[:, 3]-box_a[:, 1])).unsqueeze(1).expand_as(inter)  # [A,B]
        area_b = ((box_b[:, 2]-box_b[:, 0]) * (box_b[:, 3]-box_b[:, 1])).unsqueeze(0).expand_as(inter)  # [A,B]
        #-----------------------------------------------------------#
        #   求IOU
        #-----------------------------------------------------------#
        union = area_a + area_b - inter
        return inter / union  # [A,B]
    def get_target(self, l, targets, anchors, in_h, in_w):
        #-----------------------------------------------------#
        #   計(jì)算一共有多少張圖片
        #-----------------------------------------------------#
        bs              = len(targets)
        #-----------------------------------------------------#
        #   用于選取哪些先驗(yàn)框不包含物體
        #-----------------------------------------------------#
        noobj_mask      = torch.ones(bs, len(self.anchors_mask[l]), in_h, in_w, requires_grad = False)
        #-----------------------------------------------------#
        #   讓網(wǎng)絡(luò)更加去關(guān)注小目標(biāo)
        #-----------------------------------------------------#
        box_loss_scale  = torch.zeros(bs, len(self.anchors_mask[l]), in_h, in_w, requires_grad = False)
        #-----------------------------------------------------#
        #   batch_size, 3, 13, 13, 5 + num_classes
        #-----------------------------------------------------#
        y_true          = torch.zeros(bs, len(self.anchors_mask[l]), in_h, in_w, self.bbox_attrs, requires_grad = False)
        for b in range(bs):            
            if len(targets[b])==0:
                continue
            batch_target = torch.zeros_like(targets[b])
            #-------------------------------------------------------#
            #   計(jì)算出正樣本在特征層上的中心點(diǎn)
            #-------------------------------------------------------#
            batch_target[:, [0,2]] = targets[b][:, [0,2]] * in_w
            batch_target[:, [1,3]] = targets[b][:, [1,3]] * in_h
            batch_target[:, 4] = targets[b][:, 4]
            batch_target = batch_target.cpu()
            #-------------------------------------------------------#
            #   將真實(shí)框轉(zhuǎn)換一個形式
            #   num_true_box, 4
            #-------------------------------------------------------#
            gt_box          = torch.FloatTensor(torch.cat((torch.zeros((batch_target.size(0), 2)), batch_target[:, 2:4]), 1))
            #-------------------------------------------------------#
            #   將先驗(yàn)框轉(zhuǎn)換一個形式
            #   9, 4
            #-------------------------------------------------------#
            anchor_shapes   = torch.FloatTensor(torch.cat((torch.zeros((len(anchors), 2)), torch.FloatTensor(anchors)), 1))
            #-------------------------------------------------------#
            #   計(jì)算交并比
            #   self.calculate_iou(gt_box, anchor_shapes) = [num_true_box, 9]每一個真實(shí)框和9個先驗(yàn)框的重合情況
            #   best_ns:
            #   [每個真實(shí)框最大的重合度max_iou, 每一個真實(shí)框最重合的先驗(yàn)框的序號]
            #-------------------------------------------------------#
            best_ns = torch.argmax(self.calculate_iou(gt_box, anchor_shapes), dim=-1)
            for t, best_n in enumerate(best_ns):
                if best_n not in self.anchors_mask[l]:
                    continue
                #----------------------------------------#
                #   判斷這個先驗(yàn)框是當(dāng)前特征點(diǎn)的哪一個先驗(yàn)框
                #----------------------------------------#
                k = self.anchors_mask[l].index(best_n)
                #----------------------------------------#
                #   獲得真實(shí)框?qū)儆谀膫€網(wǎng)格點(diǎn)
                #----------------------------------------#
                i = torch.floor(batch_target[t, 0]).long()
                j = torch.floor(batch_target[t, 1]).long()
                #----------------------------------------#
                #   取出真實(shí)框的種類
                #----------------------------------------#
                c = batch_target[t, 4].long()
                #----------------------------------------#
                #   noobj_mask代表無目標(biāo)的特征點(diǎn)
                #----------------------------------------#
                noobj_mask[b, k, j, i] = 0
                #----------------------------------------#
                #   tx、ty代表中心調(diào)整參數(shù)的真實(shí)值
                #----------------------------------------#
                y_true[b, k, j, i, 0] = batch_target[t, 0] - i.float()
                y_true[b, k, j, i, 1] = batch_target[t, 1] - j.float()
                y_true[b, k, j, i, 2] = math.log(batch_target[t, 2] / anchors[best_n][0])
                y_true[b, k, j, i, 3] = math.log(batch_target[t, 3] / anchors[best_n][1])
                y_true[b, k, j, i, 4] = 1
                y_true[b, k, j, i, c + 5] = 1
                #----------------------------------------#
                #   用于獲得xywh的比例
                #   大目標(biāo)loss權(quán)重小，小目標(biāo)loss權(quán)重大
                #----------------------------------------#
                box_loss_scale[b, k, j, i] = batch_target[t, 2] * batch_target[t, 3] / in_w / in_h
        return y_true, noobj_mask, box_loss_scale
    def get_ignore(self, l, x, y, h, w, targets, scaled_anchors, in_h, in_w, noobj_mask):
        #-----------------------------------------------------#
        #   計(jì)算一共有多少張圖片
        #-----------------------------------------------------#
        bs = len(targets)
        FloatTensor = torch.cuda.FloatTensor if x.is_cuda else torch.FloatTensor
        LongTensor  = torch.cuda.LongTensor if x.is_cuda else torch.LongTensor
        #-----------------------------------------------------#
        #   生成網(wǎng)格，先驗(yàn)框中心，網(wǎng)格左上角
        #-----------------------------------------------------#
        grid_x = torch.linspace(0, in_w - 1, in_w).repeat(in_h, 1).repeat(
            int(bs * len(self.anchors_mask[l])), 1, 1).view(x.shape).type(FloatTensor)
        grid_y = torch.linspace(0, in_h - 1, in_h).repeat(in_w, 1).t().repeat(
            int(bs * len(self.anchors_mask[l])), 1, 1).view(y.shape).type(FloatTensor)
        # 生成先驗(yàn)框的寬高
        scaled_anchors_l = np.array(scaled_anchors)[self.anchors_mask[l]]
        anchor_w = FloatTensor(scaled_anchors_l).index_select(1, LongTensor([0]))
        anchor_h = FloatTensor(scaled_anchors_l).index_select(1, LongTensor([1]))
        anchor_w = anchor_w.repeat(bs, 1).repeat(1, 1, in_h * in_w).view(w.shape)
        anchor_h = anchor_h.repeat(bs, 1).repeat(1, 1, in_h * in_w).view(h.shape)
        #-------------------------------------------------------#
        #   計(jì)算調(diào)整后的先驗(yàn)框中心與寬高
        #-------------------------------------------------------#
        pred_boxes_x    = torch.unsqueeze(x.data + grid_x, -1)
        pred_boxes_y    = torch.unsqueeze(y.data + grid_y, -1)
        pred_boxes_w    = torch.unsqueeze(torch.exp(w.data) * anchor_w, -1)
        pred_boxes_h    = torch.unsqueeze(torch.exp(h.data) * anchor_h, -1)
        pred_boxes      = torch.cat([pred_boxes_x, pred_boxes_y, pred_boxes_w, pred_boxes_h], dim = -1)
        for b in range(bs):           
            #-------------------------------------------------------#
            #   將預(yù)測結(jié)果轉(zhuǎn)換一個形式
            #   pred_boxes_for_ignore      num_anchors, 4
            #-------------------------------------------------------#
            pred_boxes_for_ignore = pred_boxes[b].view(-1, 4)
            #-------------------------------------------------------#
            #   計(jì)算真實(shí)框，并把真實(shí)框轉(zhuǎn)換成相對于特征層的大小
            #   gt_box      num_true_box, 4
            #-------------------------------------------------------#
            if len(targets[b]) > 0:
                batch_target = torch.zeros_like(targets[b])
                #-------------------------------------------------------#
                #   計(jì)算出正樣本在特征層上的中心點(diǎn)
                #-------------------------------------------------------#
                batch_target[:, [0,2]] = targets[b][:, [0,2]] * in_w
                batch_target[:, [1,3]] = targets[b][:, [1,3]] * in_h
                batch_target = batch_target[:, :4]
                #-------------------------------------------------------#
                #   計(jì)算交并比
                #   anch_ious       num_true_box, num_anchors
                #-------------------------------------------------------#
                anch_ious = self.calculate_iou(batch_target, pred_boxes_for_ignore)
                #-------------------------------------------------------#
                #   每個先驗(yàn)框?qū)?yīng)真實(shí)框的最大重合度
                #   anch_ious_max   num_anchors
                #-------------------------------------------------------#
                anch_ious_max, _    = torch.max(anch_ious, dim = 0)
                anch_ious_max       = anch_ious_max.view(pred_boxes[b].size()[:3])
                noobj_mask[b][anch_ious_max > self.ignore_threshold] = 0
        return noobj_mask

classes_path用于指向檢測類別所對應(yīng)的txt，以voc數(shù)據(jù)集為例，我們用的txt為：

訓(xùn)練自己的數(shù)據(jù)集時，可以自己建立一個cls_classes.txt，里面寫自己所需要區(qū)分的類別。

三、開始網(wǎng)絡(luò)訓(xùn)練

通過voc_annotation.py我們已經(jīng)生成了2007_train.txt以及2007_val.txt，此時我們可以開始訓(xùn)練了。

訓(xùn)練的參數(shù)較多，大家可以在下載庫后仔細(xì)看注釋，其中最重要的部分依然是train.py里的classes_path。

classes_path用于指向檢測類別所對應(yīng)的txt，這個txt和voc_annotation.py里面的txt一樣！訓(xùn)練自己的數(shù)據(jù)集必須要修改！

修改完classes_path后就可以運(yùn)行train.py開始訓(xùn)練了，在訓(xùn)練多個epoch后，權(quán)值會生成在logs文件夾中。

其它參數(shù)的作用如下：

#-------------------------------#
#   是否使用Cuda
#   沒有GPU可以設(shè)置成False
#-------------------------------#
Cuda            = True
#--------------------------------------------------------#
#   訓(xùn)練前一定要修改classes_path，使其對應(yīng)自己的數(shù)據(jù)集
#--------------------------------------------------------#
classes_path    = 'model_data/voc_classes.txt'
#---------------------------------------------------------------------#
#   anchors_path代表先驗(yàn)框?qū)?yīng)的txt文件，一般不修改。
#   anchors_mask用于幫助代碼找到對應(yīng)的先驗(yàn)框，一般不修改。
#---------------------------------------------------------------------#
anchors_path    = 'model_data/yolo_anchors.txt'
anchors_mask    = [[6, 7, 8], [3, 4, 5], [0, 1, 2]]
#------------------------------------------------------------------------------------------------------#
#   權(quán)值文件請看README，百度網(wǎng)盤下載。數(shù)據(jù)的預(yù)訓(xùn)練權(quán)重對不同數(shù)據(jù)集是通用的，因?yàn)樘卣魇峭ㄓ玫?
#   預(yù)訓(xùn)練權(quán)重對于99%的情況都必須要用，不用的話權(quán)值太過隨機(jī)，特征提取效果不明顯，網(wǎng)絡(luò)訓(xùn)練的結(jié)果也不會好。
#   如果想要斷點(diǎn)續(xù)練就將model_path設(shè)置成logs文件夾下已經(jīng)訓(xùn)練的權(quán)值文件。 
#------------------------------------------------------------------------------------------------------#
model_path      = 'model_data/yolo_weights.pth'
#------------------------------------------------------#
#   輸入的shape大小，一定要是32的倍數(shù)
#------------------------------------------------------#
input_shape     = [416, 416]
#----------------------------------------------------#
#   訓(xùn)練分為兩個階段，分別是凍結(jié)階段和解凍階段。
#   顯存不足與數(shù)據(jù)集大小無關(guān)，提示顯存不足請調(diào)小batch_size。
#   受到BatchNorm層影響，batch_size最小為1。
#----------------------------------------------------#
#----------------------------------------------------#
#   凍結(jié)階段訓(xùn)練參數(shù)
#   此時模型的主干被凍結(jié)了，特征提取網(wǎng)絡(luò)不發(fā)生改變
#   占用的顯存較小，僅對網(wǎng)絡(luò)進(jìn)行微調(diào)
#----------------------------------------------------#
Init_Epoch          = 0
Freeze_Epoch        = 50
Freeze_batch_size   = 8
Freeze_lr           = 1e-3
#----------------------------------------------------#
#   解凍階段訓(xùn)練參數(shù)
#   此時模型的主干不被凍結(jié)了，特征提取網(wǎng)絡(luò)會發(fā)生改變
#   占用的顯存較大，網(wǎng)絡(luò)所有的參數(shù)都會發(fā)生改變
#----------------------------------------------------#
UnFreeze_Epoch      = 100
Unfreeze_batch_size = 4
Unfreeze_lr         = 1e-4
#------------------------------------------------------#
#   是否進(jìn)行凍結(jié)訓(xùn)練，默認(rèn)先凍結(jié)主干訓(xùn)練后解凍訓(xùn)練。
#------------------------------------------------------#
Freeze_Train        = True
#------------------------------------------------------#
#   用于設(shè)置是否使用多線程讀取數(shù)據(jù)
#   開啟后會加快數(shù)據(jù)讀取速度，但是會占用更多內(nèi)存
#   內(nèi)存較小的電腦可以設(shè)置為2或者0  
#------------------------------------------------------#
num_workers         = 4
#----------------------------------------------------#
#   獲得圖片路徑和標(biāo)簽
#----------------------------------------------------#
train_annotation_path   = '2007_train.txt'
val_annotation_path     = '2007_val.txt'

四、訓(xùn)練結(jié)果預(yù)測

訓(xùn)練結(jié)果預(yù)測需要用到兩個文件，分別是yolo.py和predict.py。

我們首先需要去yolo.py里面修改model_path以及classes_path，這兩個參數(shù)必須要修改。

model_path指向訓(xùn)練好的權(quán)值文件，在logs文件夾里。

classes_path指向檢測類別所對應(yīng)的txt。

完成修改后就可以運(yùn)行predict.py進(jìn)行檢測了。運(yùn)行后輸入圖片路徑即可檢測。

以上就是Pytorch搭建yolo3目標(biāo)檢測平臺實(shí)現(xiàn)源碼的詳細(xì)內(nèi)容，更多關(guān)于Pytorch yolo3目標(biāo)檢測的資料請關(guān)注腳本之家其它相關(guān)文章！

您可能感興趣的文章:

欧美bbbwbbbw肥妇,免费乱码人妻系列日韩,一级黄片

軟件下載

源碼下載

軟件編程

網(wǎng)絡(luò)編程

在線工具

數(shù)據(jù)庫

CMS

常用工具

Pytorch搭建yolo3目標(biāo)檢測平臺實(shí)現(xiàn)源碼

目錄

yolo3實(shí)現(xiàn)思路

一、預(yù)測部分

1、主題網(wǎng)絡(luò)darknet53介紹

2、從特征獲取預(yù)測結(jié)果

a、構(gòu)建FPN特征金字塔進(jìn)行加強(qiáng)特征提取

b、利用Yolo Head獲得預(yù)測結(jié)果

3、預(yù)測結(jié)果的解碼

4、在原圖上進(jìn)行繪制

二、訓(xùn)練部分

1、計(jì)算loss所需參數(shù)

2、pred是什么

3、target是什么。

4、loss的計(jì)算過程

訓(xùn)練自己的YoloV3模型

一、數(shù)據(jù)集的準(zhǔn)備

二、數(shù)據(jù)集的處理

三、開始網(wǎng)絡(luò)訓(xùn)練

四、訓(xùn)練結(jié)果預(yù)測

相關(guān)文章

最新評論

大家感興趣的內(nèi)容

最近更新的內(nèi)容

常用在線小工具

Pytorch搭建yolo3目標(biāo)檢測平臺實(shí)現(xiàn)源碼

目錄

yolo3實(shí)現(xiàn)思路

一、預(yù)測部分

1、主題網(wǎng)絡(luò)darknet53介紹

2、從特征獲取預(yù)測結(jié)果

a、構(gòu)建FPN特征金字塔進(jìn)行加強(qiáng)特征提取

b、利用Yolo Head獲得預(yù)測結(jié)果

3、預(yù)測結(jié)果的解碼

4、在原圖上進(jìn)行繪制

二、訓(xùn)練部分

1、計(jì)算loss所需參數(shù)

2、pred是什么

3、target是什么。

4、loss的計(jì)算過程

訓(xùn)練自己的YoloV3模型

一、數(shù)據(jù)集的準(zhǔn)備

二、數(shù)據(jù)集的處理

三、開始網(wǎng)絡(luò)訓(xùn)練

四、訓(xùn)練結(jié)果預(yù)測

相關(guān)文章

最新評論

大家感興趣的內(nèi)容

最近更新的內(nèi)容

常用在線小工具

一、預(yù)測部分

2、從特征獲取預(yù)測結(jié)果

a、構(gòu)建FPN特征金字塔進(jìn)行加強(qiáng)特征提取

b、利用Yolo Head獲得預(yù)測結(jié)果

3、預(yù)測結(jié)果的解碼

4、在原圖上進(jìn)行繪制

二、訓(xùn)練部分

1、計(jì)算loss所需參數(shù)

2、pred是什么

3、target是什么。

4、loss的計(jì)算過程

一、數(shù)據(jù)集的準(zhǔn)備

二、數(shù)據(jù)集的處理

四、訓(xùn)練結(jié)果預(yù)測