Pytorch搭建YoloV5目標(biāo)檢測(cè)平臺(tái)實(shí)現(xiàn)過(guò)程

更新時(shí)間：2022年04月29日 17:05:39 作者：Bubbliiiing

這篇文章主要為大家介紹了Pytorch搭建YoloV5目標(biāo)檢測(cè)平臺(tái)實(shí)現(xiàn)過(guò)程，有需要的朋友可以借鑒參考下，希望能夠有所幫助，祝大家多多進(jìn)步，早日升職加薪

學(xué)習(xí)前言

這個(gè)很久都沒(méi)有學(xué)，最終還是決定看看，復(fù)現(xiàn)的是YoloV5的第5版，V5有好多版本在，作者也一直在更新，我選了這個(gè)時(shí)間的倒數(shù)第二個(gè)版本。

源碼下載

https://github.com/bubbliiiing/yolov5-pytorch

YoloV5改進(jìn)的部分（不完全）

1、主干部分：使用了Focus網(wǎng)絡(luò)結(jié)構(gòu)，具體操作是在一張圖片中每隔一個(gè)像素拿到一個(gè)值，這個(gè)時(shí)候獲得了四個(gè)獨(dú)立的特征層，然后將四個(gè)獨(dú)立的特征層進(jìn)行堆疊，此時(shí)寬高信息就集中到了通道信息，輸入通道擴(kuò)充了四倍。該結(jié)構(gòu)在YoloV5第5版之前有所應(yīng)用，最新版本中未使用。

2、數(shù)據(jù)增強(qiáng)：Mosaic數(shù)據(jù)增強(qiáng)、Mosaic利用了四張圖片進(jìn)行拼接實(shí)現(xiàn)數(shù)據(jù)中增強(qiáng)，根據(jù)論文所說(shuō)其擁有一個(gè)巨大的優(yōu)點(diǎn)是豐富檢測(cè)物體的背景！且在BN計(jì)算的時(shí)候一下子會(huì)計(jì)算四張圖片的數(shù)據(jù)！

3、多正樣本匹配：在之前的Yolo系列里面，在訓(xùn)練時(shí)每一個(gè)真實(shí)框?qū)?yīng)一個(gè)正樣本，即在訓(xùn)練時(shí)，每一個(gè)真實(shí)框僅由一個(gè)先驗(yàn)框負(fù)責(zé)預(yù)測(cè)。YoloV5中為了加快模型的訓(xùn)練效率，增加了正樣本的數(shù)量，在訓(xùn)練時(shí)，每一個(gè)真實(shí)框可以由多個(gè)先驗(yàn)框負(fù)責(zé)預(yù)測(cè)。

以上并非全部的改進(jìn)部分，還存在一些其它的改進(jìn)，這里只列出來(lái)了一些我比較感興趣，而且非常有效的改進(jìn)。

一、整體結(jié)構(gòu)解析

在學(xué)習(xí)YoloV5之前，我們需要對(duì)YoloV5所作的工作有一定的了解，這有助于我們后面去了解網(wǎng)絡(luò)的細(xì)節(jié)。

和之前版本的Yolo類(lèi)似，整個(gè)YoloV5可以依然可以分為三個(gè)部分，分別是Backbone，F(xiàn)PN以及Yolo Head。

Backbone可以被稱(chēng)作YoloV5的主干特征提取網(wǎng)絡(luò)，根據(jù)它的結(jié)構(gòu)以及之前Yolo主干的叫法，我一般叫它CSPDarknet，輸入的圖片首先會(huì)在CSPDarknet里面進(jìn)行特征提取，提取到的特征可以被稱(chēng)作特征層，是輸入圖片的特征集合。在主干部分，我們獲取了三個(gè)特征層進(jìn)行下一步網(wǎng)絡(luò)的構(gòu)建，這三個(gè)特征層我稱(chēng)它為有效特征層。

FPN可以被稱(chēng)作YoloV5的加強(qiáng)特征提取網(wǎng)絡(luò)，在主干部分獲得的三個(gè)有效特征層會(huì)在這一部分進(jìn)行特征融合，特征融合的目的是結(jié)合不同尺度的特征信息。在FPN部分，已經(jīng)獲得的有效特征層被用于繼續(xù)提取特征。在YoloV5里依然使用到了Panet的結(jié)構(gòu)，我們不僅會(huì)對(duì)特征進(jìn)行上采樣實(shí)現(xiàn)特征融合，還會(huì)對(duì)特征再次進(jìn)行下采樣實(shí)現(xiàn)特征融合。

Yolo Head是YoloV5的分類(lèi)器與回歸器，通過(guò)CSPDarknet和FPN，我們已經(jīng)可以獲得三個(gè)加強(qiáng)過(guò)的有效特征層。每一個(gè)特征層都有寬、高和通道數(shù)，此時(shí)我們可以將特征圖看作一個(gè)又一個(gè)特征點(diǎn)的集合，每一個(gè)特征點(diǎn)都有通道數(shù)個(gè)特征。Yolo Head實(shí)際上所做的工作就是對(duì)特征點(diǎn)進(jìn)行判斷，判斷特征點(diǎn)是否有物體與其對(duì)應(yīng)。與以前版本的Yolo一樣，YoloV5所用的解耦頭是一起的，也就是分類(lèi)和回歸在一個(gè)1X1卷積里實(shí)現(xiàn)。

因此，整個(gè)YoloV5網(wǎng)絡(luò)所作的工作就是特征提取-特征加強(qiáng)-預(yù)測(cè)特征點(diǎn)對(duì)應(yīng)的物體情況。

二、網(wǎng)絡(luò)結(jié)構(gòu)解析

1、主干網(wǎng)絡(luò)Backbone介紹

YoloV5所使用的主干特征提取網(wǎng)絡(luò)為CSPDarknet，它具有五個(gè)重要特點(diǎn)：

1、使用了殘差網(wǎng)絡(luò)Residual，CSPDarknet中的殘差卷積可以分為兩個(gè)部分，主干部分是一次1X1的卷積和一次3X3的卷積；殘差邊部分不做任何處理，直接將主干的輸入與輸出結(jié)合。

整個(gè)YoloV5的主干部分都由殘差卷積構(gòu)成：

class Bottleneck(nn.Module):
    # Standard bottleneck
    def __init__(self, c1, c2, shortcut=True, g=1, e=0.5):  # ch_in, ch_out, shortcut, groups, expansion
        super(Bottleneck, self).__init__()
        c_ = int(c2 * e)  # hidden channels
        self.cv1 = Conv(c1, c_, 1, 1)
        self.cv2 = Conv(c_, c2, 3, 1, g=g)
        self.add = shortcut and c1 == c2
    def forward(self, x):
        return x + self.cv2(self.cv1(x)) if self.add else self.cv2(self.cv1(x))

殘差網(wǎng)絡(luò)的特點(diǎn)是容易優(yōu)化，并且能夠通過(guò)增加相當(dāng)?shù)纳疃葋?lái)提高準(zhǔn)確率。其內(nèi)部的殘差塊使用了跳躍連接，緩解了在深度神經(jīng)網(wǎng)絡(luò)中增加深度帶來(lái)的梯度消失問(wèn)題。

2、使用CSPnet網(wǎng)絡(luò)結(jié)構(gòu)，CSPnet結(jié)構(gòu)并不算復(fù)雜，就是將原來(lái)的殘差塊的堆疊進(jìn)行了一個(gè)拆分，拆成左右兩部分：

主干部分繼續(xù)進(jìn)行原來(lái)的殘差塊的堆疊；

另一部分則像一個(gè)殘差邊一樣，經(jīng)過(guò)少量處理直接連接到最后。

因此可以認(rèn)為CSP中存在一個(gè)大的殘差邊。

class C3(nn.Module):
    # CSP Bottleneck with 3 convolutions
    def __init__(self, c1, c2, n=1, shortcut=True, g=1, e=0.5):  # ch_in, ch_out, number, shortcut, groups, expansion
        super(C3, self).__init__()
        c_ = int(c2 * e)  # hidden channels
        self.cv1 = Conv(c1, c_, 1, 1)
        self.cv2 = Conv(c1, c_, 1, 1)
        self.cv3 = Conv(2 * c_, c2, 1)  # act=FReLU(c2)
        self.m = nn.Sequential(*[Bottleneck(c_, c_, shortcut, g, e=1.0) for _ in range(n)])
        # self.m = nn.Sequential(*[CrossConv(c_, c_, 3, 1, g, 1.0, shortcut) for _ in range(n)])
    def forward(self, x):
        return self.cv3(torch.cat((self.m(self.cv1(x)), self.cv2(x)), dim=1))

3、使用了Focus網(wǎng)絡(luò)結(jié)構(gòu)，這個(gè)網(wǎng)絡(luò)結(jié)構(gòu)是在YoloV5里面使用到比較有趣的網(wǎng)絡(luò)結(jié)構(gòu)，具體操作是在一張圖片中每隔一個(gè)像素拿到一個(gè)值，這個(gè)時(shí)候獲得了四個(gè)獨(dú)立的特征層，然后將四個(gè)獨(dú)立的特征層進(jìn)行堆疊，此時(shí)寬高信息就集中到了通道信息，輸入通道擴(kuò)充了四倍。拼接起來(lái)的特征層相對(duì)于原先的三通道變成了十二個(gè)通道，下圖很好的展示了Focus結(jié)構(gòu)，一看就能明白。

class Focus(nn.Module):
    def __init__(self, c1, c2, k=1, s=1, p=None, g=1, act=True):  # ch_in, ch_out, kernel, stride, padding, groups
        super(Focus, self).__init__()
        self.conv = Conv(c1 * 4, c2, k, s, p, g, act)
    def forward(self, x):
        return self.conv(torch.cat([x[..., ::2, ::2], x[..., 1::2, ::2], x[..., ::2, 1::2], x[..., 1::2, 1::2]], 1))

4、使用了SiLU激活函數(shù)，SiLU是Sigmoid和ReLU的改進(jìn)版。SiLU具備無(wú)上界有下界、平滑、非單調(diào)的特性。SiLU在深層模型上的效果優(yōu)于 ReLU。可以看做是平滑的ReLU激活函數(shù)。

class SiLU(nn.Module):
    @staticmethod
    def forward(x):
        return x * torch.sigmoid(x)

5、使用了SPP結(jié)構(gòu)，通過(guò)不同池化核大小的最大池化進(jìn)行特征提取，提高網(wǎng)絡(luò)的感受野。在YoloV4中，SPP是用在FPN里面的，在YoloV5中，SPP模塊被用在了主干特征提取網(wǎng)絡(luò)中。

class SPP(nn.Module):
    # Spatial pyramid pooling layer used in YOLOv3-SPP
    def __init__(self, c1, c2, k=(5, 9, 13)):
        super(SPP, self).__init__()
        c_ = c1 // 2  # hidden channels
        self.cv1 = Conv(c1, c_, 1, 1)
        self.cv2 = Conv(c_ * (len(k) + 1), c2, 1, 1)
        self.m = nn.ModuleList([nn.MaxPool2d(kernel_size=x, stride=1, padding=x // 2) for x in k])
    def forward(self, x):
        x = self.cv1(x)
        return self.cv2(torch.cat([x] + [m(x) for m in self.m], 1))

整個(gè)主干實(shí)現(xiàn)代碼為：

import torch
import torch.nn as nn
class SiLU(nn.Module):
    @staticmethod
    def forward(x):
        return x * torch.sigmoid(x)
def autopad(k, p=None):
    if p is None:
        p = k // 2 if isinstance(k, int) else [x // 2 for x in k] 
    return p
class Focus(nn.Module):
    def __init__(self, c1, c2, k=1, s=1, p=None, g=1, act=True):  # ch_in, ch_out, kernel, stride, padding, groups
        super(Focus, self).__init__()
        self.conv = Conv(c1 * 4, c2, k, s, p, g, act)
    def forward(self, x):
        return self.conv(torch.cat([x[..., ::2, ::2], x[..., 1::2, ::2], x[..., ::2, 1::2], x[..., 1::2, 1::2]], 1))
class Conv(nn.Module):
    def __init__(self, c1, c2, k=1, s=1, p=None, g=1, act=True):
        super(Conv, self).__init__()
        self.conv = nn.Conv2d(c1, c2, k, s, autopad(k, p), groups=g, bias=False)
        self.bn = nn.BatchNorm2d(c2, eps=0.001, momentum=0.03)
        self.act = SiLU() if act is True else (act if isinstance(act, nn.Module) else nn.Identity())
    def forward(self, x):
        return self.act(self.bn(self.conv(x)))
    def fuseforward(self, x):
        return self.act(self.conv(x))
class Bottleneck(nn.Module):
    # Standard bottleneck
    def __init__(self, c1, c2, shortcut=True, g=1, e=0.5):  # ch_in, ch_out, shortcut, groups, expansion
        super(Bottleneck, self).__init__()
        c_ = int(c2 * e)  # hidden channels
        self.cv1 = Conv(c1, c_, 1, 1)
        self.cv2 = Conv(c_, c2, 3, 1, g=g)
        self.add = shortcut and c1 == c2
    def forward(self, x):
        return x + self.cv2(self.cv1(x)) if self.add else self.cv2(self.cv1(x))
class C3(nn.Module):
    # CSP Bottleneck with 3 convolutions
    def __init__(self, c1, c2, n=1, shortcut=True, g=1, e=0.5):  # ch_in, ch_out, number, shortcut, groups, expansion
        super(C3, self).__init__()
        c_ = int(c2 * e)  # hidden channels
        self.cv1 = Conv(c1, c_, 1, 1)
        self.cv2 = Conv(c1, c_, 1, 1)
        self.cv3 = Conv(2 * c_, c2, 1)  # act=FReLU(c2)
        self.m = nn.Sequential(*[Bottleneck(c_, c_, shortcut, g, e=1.0) for _ in range(n)])
        # self.m = nn.Sequential(*[CrossConv(c_, c_, 3, 1, g, 1.0, shortcut) for _ in range(n)])
    def forward(self, x):
        return self.cv3(torch.cat((self.m(self.cv1(x)), self.cv2(x)), dim=1))
class SPP(nn.Module):
    # Spatial pyramid pooling layer used in YOLOv3-SPP
    def __init__(self, c1, c2, k=(5, 9, 13)):
        super(SPP, self).__init__()
        c_ = c1 // 2  # hidden channels
        self.cv1 = Conv(c1, c_, 1, 1)
        self.cv2 = Conv(c_ * (len(k) + 1), c2, 1, 1)
        self.m = nn.ModuleList([nn.MaxPool2d(kernel_size=x, stride=1, padding=x // 2) for x in k])
    def forward(self, x):
        x = self.cv1(x)
        return self.cv2(torch.cat([x] + [m(x) for m in self.m], 1))
class CSPDarknet(nn.Module):
    def __init__(self, base_channels, base_depth):
        super().__init__()
        #-----------------------------------------------#
        #   輸入圖片是640, 640, 3
        #   初始的基本通道是64
        #-----------------------------------------------#
        #-----------------------------------------------#
        #   利用focus網(wǎng)絡(luò)結(jié)構(gòu)進(jìn)行特征提取
        #   640, 640, 3 -> 320, 320, 12 -> 320, 320, 64
        #-----------------------------------------------#
        self.stem       = Focus(3, base_channels, k=3)
        #-----------------------------------------------#
        #   完成卷積之后，320, 320, 64 -> 160, 160, 128
        #   完成CSPlayer之后，160, 160, 128 -> 160, 160, 128
        #-----------------------------------------------#
        self.dark2 = nn.Sequential(
            Conv(base_channels, base_channels * 2, 3, 2),
            C3(base_channels * 2, base_channels * 2, base_depth),
        )
        #-----------------------------------------------#
        #   完成卷積之后，160, 160, 128 -> 80, 80, 256
        #   完成CSPlayer之后，80, 80, 256 -> 80, 80, 256
        #-----------------------------------------------#
        self.dark3 = nn.Sequential(
            Conv(base_channels * 2, base_channels * 4, 3, 2),
            C3(base_channels * 4, base_channels * 4, base_depth * 3),
        )
        #-----------------------------------------------#
        #   完成卷積之后，80, 80, 256 -> 40, 40, 512
        #   完成CSPlayer之后，40, 40, 512 -> 40, 40, 512
        #-----------------------------------------------#
        self.dark4 = nn.Sequential(
            Conv(base_channels * 4, base_channels * 8, 3, 2),
            C3(base_channels * 8, base_channels * 8, base_depth * 3),
        )
        #-----------------------------------------------#
        #   完成卷積之后，40, 40, 512 -> 20, 20, 1024
        #   完成SPP之后，20, 20, 1024 -> 20, 20, 1024
        #   完成CSPlayer之后，20, 20, 1024 -> 20, 20, 1024
        #-----------------------------------------------#
        self.dark5 = nn.Sequential(
            Conv(base_channels * 8, base_channels * 16, 3, 2),
            SPP(base_channels * 16, base_channels * 16),
            C3(base_channels * 16, base_channels * 16, base_depth, shortcut=False),
        )
    def forward(self, x):
        x = self.stem(x)
        x = self.dark2(x)
        #-----------------------------------------------#
        #   dark3的輸出為80, 80, 256，是一個(gè)有效特征層
        #-----------------------------------------------#
        x = self.dark3(x)
        feat1 = x
        #-----------------------------------------------#
        #   dark4的輸出為40, 40, 512，是一個(gè)有效特征層
        #-----------------------------------------------#
        x = self.dark4(x)
        feat2 = x
        #-----------------------------------------------#
        #   dark5的輸出為20, 20, 1024，是一個(gè)有效特征層
        #-----------------------------------------------#
        x = self.dark5(x)
        feat3 = x
        return feat1, feat2, feat3

2、構(gòu)建FPN特征金字塔進(jìn)行加強(qiáng)特征提取

在特征利用部分，YoloV5提取多特征層進(jìn)行目標(biāo)檢測(cè)，一共提取三個(gè)特征層。

三個(gè)特征層位于主干部分CSPdarknet的不同位置，分別位于中間層，中下層，底層，當(dāng)輸入為(640,640,3)的時(shí)候，三個(gè)特征層的shape分別為feat1=(80,80,256)、feat2=(40,40,512)、feat3=(20,20,1024)。

在獲得三個(gè)有效特征層后，我們利用這三個(gè)有效特征層進(jìn)行FPN層的構(gòu)建，構(gòu)建方式為：

feat3=(20,20,1024)的特征層進(jìn)行1次1X1卷積調(diào)整通道后獲得P5，P5進(jìn)行上采樣UmSampling2d后與feat2=(40,40,512)特征層進(jìn)行結(jié)合，然后使用CSPLayer進(jìn)行特征提取獲得P5_upsample，此時(shí)獲得的特征層為(40,40,512)。
P5_upsample=(40,40,512)的特征層進(jìn)行1次1X1卷積調(diào)整通道后獲得P4，P4進(jìn)行上采樣UmSampling2d后與feat1=(80,80,256)特征層進(jìn)行結(jié)合，然后使用CSPLayer進(jìn)行特征提取P3_out，此時(shí)獲得的特征層為(80,80,256)。
P3_out=(80,80,256)的特征層進(jìn)行一次3x3卷積進(jìn)行下采樣，下采樣后與P4堆疊，然后使用CSPLayer進(jìn)行特征提取P4_out，此時(shí)獲得的特征層為(40,40,512)。
P4_out=(40,40,512)的特征層進(jìn)行一次3x3卷積進(jìn)行下采樣，下采樣后與P5堆疊，然后使用CSPLayer進(jìn)行特征提取P5_out，此時(shí)獲得的特征層為(20,20,1024)。

特征金字塔可以將不同shape的特征層進(jìn)行特征融合，有利于提取出更好的特征。

import torch
import torch.nn as nn
from nets.CSPdarknet import CSPDarknet, C3, Conv
#---------------------------------------------------#
#   yolo_body
#---------------------------------------------------#
class YoloBody(nn.Module):
    def __init__(self, anchors_mask, num_classes, phi):
        super(YoloBody, self).__init__()
        depth_dict          = {'s' : 0.33, 'm' : 0.67, 'l' : 1.00, 'x' : 1.33,}
        width_dict          = {'s' : 0.50, 'm' : 0.75, 'l' : 1.00, 'x' : 1.25,}
        dep_mul, wid_mul    = depth_dict[phi], width_dict[phi]
        base_channels       = int(wid_mul * 64)  # 64
        base_depth          = max(round(dep_mul * 3), 1)  # 3
        #-----------------------------------------------#
        #   輸入圖片是640, 640, 3
        #   初始的基本通道是64
        #-----------------------------------------------#
        #---------------------------------------------------#   
        #   生成CSPdarknet53的主干模型
        #   獲得三個(gè)有效特征層，他們的shape分別是：
        #   80,80,256
        #   40,40,512
        #   20,20,1024
        #---------------------------------------------------#
        self.backbone   = CSPDarknet(base_channels, base_depth)
        self.upsample   = nn.Upsample(scale_factor=2, mode="nearest")
        self.conv_for_feat3         = Conv(base_channels * 16, base_channels * 8, 1, 1)
        self.conv3_for_upsample1    = C3(base_channels * 16, base_channels * 8, base_depth, shortcut=False)
        self.conv_for_feat2         = Conv(base_channels * 8, base_channels * 4, 1, 1)
        self.conv3_for_upsample2    = C3(base_channels * 8, base_channels * 4, base_depth, shortcut=False)
        self.down_sample1           = Conv(base_channels * 4, base_channels * 4, 3, 2)
        self.conv3_for_downsample1  = C3(base_channels * 8, base_channels * 8, base_depth, shortcut=False)
        self.down_sample2           = Conv(base_channels * 8, base_channels * 8, 3, 2)
        self.conv3_for_downsample2  = C3(base_channels * 16, base_channels * 16, base_depth, shortcut=False)
        self.yolo_head_P3 = nn.Conv2d(base_channels * 4, len(anchors_mask[2]) * (5 + num_classes), 1)
        self.yolo_head_P4 = nn.Conv2d(base_channels * 8, len(anchors_mask[1]) * (5 + num_classes), 1)
        self.yolo_head_P5 = nn.Conv2d(base_channels * 16, len(anchors_mask[0]) * (5 + num_classes), 1)
    def forward(self, x):
        #  backbone
        feat1, feat2, feat3 = self.backbone(x)
        P5          = self.conv_for_feat3(feat3)
        P5_upsample = self.upsample(P5)
        P4          = torch.cat([P5_upsample, feat2], 1)
        P4          = self.conv3_for_upsample1(P4)
        P4          = self.conv_for_feat2(P4)
        P4_upsample = self.upsample(P4)
        P3          = torch.cat([P4_upsample, feat1], 1)
        P3          = self.conv3_for_upsample2(P3)
        P3_downsample = self.down_sample1(P3)
        P4 = torch.cat([P3_downsample, P4], 1)
        P4 = self.conv3_for_downsample1(P4)
        P4_downsample = self.down_sample2(P4)
        P5 = torch.cat([P4_downsample, P5], 1)
        P5 = self.conv3_for_downsample2(P5)
        #---------------------------------------------------#
        #   第三個(gè)特征層
        #   y3=(batch_size,75,80,80)
        #---------------------------------------------------#
        out2 = self.yolo_head_P3(P3)
        #---------------------------------------------------#
        #   第二個(gè)特征層
        #   y2=(batch_size,75,40,40)
        #---------------------------------------------------#
        out1 = self.yolo_head_P4(P4)
        #---------------------------------------------------#
        #   第一個(gè)特征層
        #   y1=(batch_size,75,20,20)
        #---------------------------------------------------#
        out0 = self.yolo_head_P5(P5)
        return out0, out1, out2

3、利用Yolo Head獲得預(yù)測(cè)結(jié)果

利用FPN特征金字塔，我們可以獲得三個(gè)加強(qiáng)特征，這三個(gè)加強(qiáng)特征的shape分別為(20,20,1024)、(40,40,512)、(80,80,256)，然后我們利用這三個(gè)shape的特征層傳入Yolo Head獲得預(yù)測(cè)結(jié)果。

對(duì)于每一個(gè)特征層，我們可以獲得利用一個(gè)卷積調(diào)整通道數(shù)，最終的通道數(shù)和需要區(qū)分的種類(lèi)個(gè)數(shù)相關(guān)，在YoloV5里，每一個(gè)特征層上每一個(gè)特征點(diǎn)存在3個(gè)先驗(yàn)框。

如果使用的是voc訓(xùn)練集，類(lèi)則為20種，最后的維度應(yīng)該為75 = 3x25，三個(gè)特征層的shape為(20,20,75)，(40,40,75)，(80,80,75)。

最后的75可以拆分成3個(gè)25，對(duì)應(yīng)3個(gè)先驗(yàn)框的25個(gè)參數(shù)，25可以拆分成4+1+20。
前4個(gè)參數(shù)用于判斷每一個(gè)特征點(diǎn)的回歸參數(shù)，回歸參數(shù)調(diào)整后可以獲得預(yù)測(cè)框；
第5個(gè)參數(shù)用于判斷每一個(gè)特征點(diǎn)是否包含物體；
最后20個(gè)參數(shù)用于判斷每一個(gè)特征點(diǎn)所包含的物體種類(lèi)。

如果使用的是coco訓(xùn)練集，類(lèi)則為80種，最后的維度應(yīng)該為255 = 3x85，三個(gè)特征層的shape為(20,20,255)，(40,40,255)，(80,80,255)

最后的255可以拆分成3個(gè)85，對(duì)應(yīng)3個(gè)先驗(yàn)框的85個(gè)參數(shù)，85可以拆分成4+1+80。
前4個(gè)參數(shù)用于判斷每一個(gè)特征點(diǎn)的回歸參數(shù)，回歸參數(shù)調(diào)整后可以獲得預(yù)測(cè)框；
第5個(gè)參數(shù)用于判斷每一個(gè)特征點(diǎn)是否包含物體；
最后80個(gè)參數(shù)用于判斷每一個(gè)特征點(diǎn)所包含的物體種類(lèi)。

實(shí)現(xiàn)代碼如下：

import torch
import torch.nn as nn
from nets.CSPdarknet import CSPDarknet, C3, Conv
#---------------------------------------------------#
#   yolo_body
#---------------------------------------------------#
class YoloBody(nn.Module):
    def __init__(self, anchors_mask, num_classes, phi):
        super(YoloBody, self).__init__()
        depth_dict          = {'s' : 0.33, 'm' : 0.67, 'l' : 1.00, 'x' : 1.33,}
        width_dict          = {'s' : 0.50, 'm' : 0.75, 'l' : 1.00, 'x' : 1.25,}
        dep_mul, wid_mul    = depth_dict[phi], width_dict[phi]
        base_channels       = int(wid_mul * 64)  # 64
        base_depth          = max(round(dep_mul * 3), 1)  # 3
        #-----------------------------------------------#
        #   輸入圖片是640, 640, 3
        #   初始的基本通道是64
        #-----------------------------------------------#
        #---------------------------------------------------#   
        #   生成CSPdarknet53的主干模型
        #   獲得三個(gè)有效特征層，他們的shape分別是：
        #   80,80,256
        #   40,40,512
        #   20,20,1024
        #---------------------------------------------------#
        self.backbone   = CSPDarknet(base_channels, base_depth)
        self.upsample   = nn.Upsample(scale_factor=2, mode="nearest")
        self.conv_for_feat3         = Conv(base_channels * 16, base_channels * 8, 1, 1)
        self.conv3_for_upsample1    = C3(base_channels * 16, base_channels * 8, base_depth, shortcut=False)
        self.conv_for_feat2         = Conv(base_channels * 8, base_channels * 4, 1, 1)
        self.conv3_for_upsample2    = C3(base_channels * 8, base_channels * 4, base_depth, shortcut=False)
        self.down_sample1           = Conv(base_channels * 4, base_channels * 4, 3, 2)
        self.conv3_for_downsample1  = C3(base_channels * 8, base_channels * 8, base_depth, shortcut=False)
        self.down_sample2           = Conv(base_channels * 8, base_channels * 8, 3, 2)
        self.conv3_for_downsample2  = C3(base_channels * 16, base_channels * 16, base_depth, shortcut=False)
        self.yolo_head_P3 = nn.Conv2d(base_channels * 4, len(anchors_mask[2]) * (5 + num_classes), 1)
        self.yolo_head_P4 = nn.Conv2d(base_channels * 8, len(anchors_mask[1]) * (5 + num_classes), 1)
        self.yolo_head_P5 = nn.Conv2d(base_channels * 16, len(anchors_mask[0]) * (5 + num_classes), 1)
    def forward(self, x):
        #  backbone
        feat1, feat2, feat3 = self.backbone(x)
        P5          = self.conv_for_feat3(feat3)
        P5_upsample = self.upsample(P5)
        P4          = torch.cat([P5_upsample, feat2], 1)
        P4          = self.conv3_for_upsample1(P4)
        P4          = self.conv_for_feat2(P4)
        P4_upsample = self.upsample(P4)
        P3          = torch.cat([P4_upsample, feat1], 1)
        P3          = self.conv3_for_upsample2(P3)
        P3_downsample = self.down_sample1(P3)
        P4 = torch.cat([P3_downsample, P4], 1)
        P4 = self.conv3_for_downsample1(P4)
        P4_downsample = self.down_sample2(P4)
        P5 = torch.cat([P4_downsample, P5], 1)
        P5 = self.conv3_for_downsample2(P5)
        #---------------------------------------------------#
        #   第三個(gè)特征層
        #   y3=(batch_size,75,80,80)
        #---------------------------------------------------#
        out2 = self.yolo_head_P3(P3)
        #---------------------------------------------------#
        #   第二個(gè)特征層
        #   y2=(batch_size,75,40,40)
        #---------------------------------------------------#
        out1 = self.yolo_head_P4(P4)
        #---------------------------------------------------#
        #   第一個(gè)特征層
        #   y1=(batch_size,75,20,20)
        #---------------------------------------------------#
        out0 = self.yolo_head_P5(P5)
        return out0, out1, out2

三、預(yù)測(cè)結(jié)果的解碼

1、獲得預(yù)測(cè)框與得分

由第二步我們可以獲得三個(gè)特征層的預(yù)測(cè)結(jié)果，shape分別為(N,20,20,255)，(N,40,40,255)，(N,80,80,255)的數(shù)據(jù)。

但是這個(gè)預(yù)測(cè)結(jié)果并不對(duì)應(yīng)著最終的預(yù)測(cè)框在圖片上的位置，還需要解碼才可以完成。在YoloV5里，每一個(gè)特征層上每一個(gè)特征點(diǎn)存在3個(gè)先驗(yàn)框。

每個(gè)特征層最后的255可以拆分成3個(gè)85，對(duì)應(yīng)3個(gè)先驗(yàn)框的85個(gè)參數(shù)，我們先將其reshape一下，其結(jié)果為(N,20,20,3,85)，(N,40.40,3,85)，(N,80,80,3,85)。

其中的85可以拆分成4+1+80。

前4個(gè)參數(shù)用于判斷每一個(gè)特征點(diǎn)的回歸參數(shù)，回歸參數(shù)調(diào)整后可以獲得預(yù)測(cè)框；
第5個(gè)參數(shù)用于判斷每一個(gè)特征點(diǎn)是否包含物體；
最后80個(gè)參數(shù)用于判斷每一個(gè)特征點(diǎn)所包含的物體種類(lèi)。

以(N,20,20,3,85)這個(gè)特征層為例，該特征層相當(dāng)于將圖像劃分成20x20個(gè)特征點(diǎn)，如果某個(gè)特征點(diǎn)落在物體的對(duì)應(yīng)框內(nèi)，就用于預(yù)測(cè)該物體。

如圖所示，藍(lán)色的點(diǎn)為20x20的特征點(diǎn)，此時(shí)我們對(duì)左圖黑色點(diǎn)的三個(gè)先驗(yàn)框進(jìn)行解碼操作演示：

1、進(jìn)行中心預(yù)測(cè)點(diǎn)的計(jì)算，利用Regression預(yù)測(cè)結(jié)果前兩個(gè)序號(hào)的內(nèi)容對(duì)特征點(diǎn)的三個(gè)先驗(yàn)框中心坐標(biāo)進(jìn)行偏移，偏移后是右圖紅色的三個(gè)點(diǎn)；

2、進(jìn)行預(yù)測(cè)框?qū)捀叩挠?jì)算，利用Regression預(yù)測(cè)結(jié)果后兩個(gè)序號(hào)的內(nèi)容求指數(shù)后獲得預(yù)測(cè)框的寬高；

3、此時(shí)獲得的預(yù)測(cè)框就可以繪制在圖片上了。

除去這樣的解碼操作，還有非極大抑制的操作需要進(jìn)行，防止同一種類(lèi)的框的堆積。

def decode_box(self, inputs):
    outputs = []
    for i, input in enumerate(inputs):
        #-----------------------------------------------#
        #   輸入的input一共有三個(gè)，他們的shape分別是
        #   batch_size, 255, 20, 20
        #   batch_size, 255, 40, 40
        #   batch_size, 255, 80, 80
        #-----------------------------------------------#
        batch_size      = input.size(0)
        input_height    = input.size(2)
        input_width     = input.size(3)
        #-----------------------------------------------#
        #   輸入為416x416時(shí)
        #   stride_h = stride_w = 32、16、8
        #-----------------------------------------------#
        stride_h = self.input_shape[0] / input_height
        stride_w = self.input_shape[1] / input_width
        #-------------------------------------------------#
        #   此時(shí)獲得的scaled_anchors大小是相對(duì)于特征層的
        #-------------------------------------------------#
        scaled_anchors = [(anchor_width / stride_w, anchor_height / stride_h) for anchor_width, anchor_height in self.anchors[self.anchors_mask[i]]]
        #-----------------------------------------------#
        #   輸入的input一共有三個(gè)，他們的shape分別是
        #   batch_size, 3, 20, 20, 85
        #   batch_size, 3, 40, 40, 85
        #   batch_size, 3, 80, 80, 85
        #-----------------------------------------------#
        prediction = input.view(batch_size, len(self.anchors_mask[i]),
                                self.bbox_attrs, input_height, input_width).permute(0, 1, 3, 4, 2).contiguous()
        #-----------------------------------------------#
        #   先驗(yàn)框的中心位置的調(diào)整參數(shù)
        #-----------------------------------------------#
        x = torch.sigmoid(prediction[..., 0])  
        y = torch.sigmoid(prediction[..., 1])
        #-----------------------------------------------#
        #   先驗(yàn)框的寬高調(diào)整參數(shù)
        #-----------------------------------------------#
        w = torch.sigmoid(prediction[..., 2]) 
        h = torch.sigmoid(prediction[..., 3]) 
        #-----------------------------------------------#
        #   獲得置信度，是否有物體
        #-----------------------------------------------#
        conf        = torch.sigmoid(prediction[..., 4])
        #-----------------------------------------------#
        #   種類(lèi)置信度
        #-----------------------------------------------#
        pred_cls    = torch.sigmoid(prediction[..., 5:])
        FloatTensor = torch.cuda.FloatTensor if x.is_cuda else torch.FloatTensor
        LongTensor  = torch.cuda.LongTensor if x.is_cuda else torch.LongTensor
        #----------------------------------------------------------#
        #   生成網(wǎng)格，先驗(yàn)框中心，網(wǎng)格左上角 
        #   batch_size,3,20,20
        #----------------------------------------------------------#
        grid_x = torch.linspace(0, input_width - 1, input_width).repeat(input_height, 1).repeat(
            batch_size * len(self.anchors_mask[i]), 1, 1).view(x.shape).type(FloatTensor)
        grid_y = torch.linspace(0, input_height - 1, input_height).repeat(input_width, 1).t().repeat(
            batch_size * len(self.anchors_mask[i]), 1, 1).view(y.shape).type(FloatTensor)
        #----------------------------------------------------------#
        #   按照網(wǎng)格格式生成先驗(yàn)框的寬高
        #   batch_size,3,20,20
        #----------------------------------------------------------#
        anchor_w = FloatTensor(scaled_anchors).index_select(1, LongTensor([0]))
        anchor_h = FloatTensor(scaled_anchors).index_select(1, LongTensor([1]))
        anchor_w = anchor_w.repeat(batch_size, 1).repeat(1, 1, input_height * input_width).view(w.shape)
        anchor_h = anchor_h.repeat(batch_size, 1).repeat(1, 1, input_height * input_width).view(h.shape)
        #----------------------------------------------------------#
        #   利用預(yù)測(cè)結(jié)果對(duì)先驗(yàn)框進(jìn)行調(diào)整
        #   首先調(diào)整先驗(yàn)框的中心，從先驗(yàn)框中心向右下角偏移
        #   再調(diào)整先驗(yàn)框的寬高。
        #----------------------------------------------------------#
        pred_boxes          = FloatTensor(prediction[..., :4].shape)
        pred_boxes[..., 0]  = x.data * 2. - 0.5 + grid_x
        pred_boxes[..., 1]  = y.data * 2. - 0.5 + grid_y
        pred_boxes[..., 2]  = (w.data * 2) ** 2 * anchor_w
        pred_boxes[..., 3]  = (h.data * 2) ** 2 * anchor_h
        #----------------------------------------------------------#
        #   將輸出結(jié)果歸一化成小數(shù)的形式
        #----------------------------------------------------------#
        _scale = torch.Tensor([input_width, input_height, input_width, input_height]).type(FloatTensor)
        output = torch.cat((pred_boxes.view(batch_size, -1, 4) / _scale,
                            conf.view(batch_size, -1, 1), pred_cls.view(batch_size, -1, self.num_classes)), -1)
        outputs.append(output.data)
    return outputs

2、得分篩選與非極大抑制

得到最終的預(yù)測(cè)結(jié)果后還要進(jìn)行得分排序與非極大抑制篩選。

得分篩選就是篩選出得分滿足confidence置信度的預(yù)測(cè)框。

非極大抑制就是篩選出一定區(qū)域內(nèi)屬于同一種類(lèi)得分最大的框。

得分篩選與非極大抑制的過(guò)程可以概括如下：

1、找出該圖片中得分大于門(mén)限函數(shù)的框。在進(jìn)行重合框篩選前就進(jìn)行得分的篩選可以大幅度減少框的數(shù)量。

2、對(duì)種類(lèi)進(jìn)行循環(huán)，非極大抑制的作用是篩選出一定區(qū)域內(nèi)屬于同一種類(lèi)得分最大的框，對(duì)種類(lèi)進(jìn)行循環(huán)可以幫助我們對(duì)每一個(gè)類(lèi)分別進(jìn)行非極大抑制。

3、根據(jù)得分對(duì)該種類(lèi)進(jìn)行從大到小排序。

4、每次取出得分最大的框，計(jì)算其與其它所有預(yù)測(cè)框的重合程度，重合程度過(guò)大的則剔除。

得分篩選與非極大抑制后的結(jié)果就可以用于繪制預(yù)測(cè)框了。

下圖是經(jīng)過(guò)非極大抑制的。

下圖是未經(jīng)過(guò)非極大抑制的。

實(shí)現(xiàn)代碼為：

def non_max_suppression(self, prediction, num_classes, input_shape, image_shape, letterbox_image, conf_thres=0.5, nms_thres=0.4):
    #----------------------------------------------------------#
    #   將預(yù)測(cè)結(jié)果的格式轉(zhuǎn)換成左上角右下角的格式。
    #   prediction  [batch_size, num_anchors, 85]
    #----------------------------------------------------------#
    box_corner          = prediction.new(prediction.shape)
    box_corner[:, :, 0] = prediction[:, :, 0] - prediction[:, :, 2] / 2
    box_corner[:, :, 1] = prediction[:, :, 1] - prediction[:, :, 3] / 2
    box_corner[:, :, 2] = prediction[:, :, 0] + prediction[:, :, 2] / 2
    box_corner[:, :, 3] = prediction[:, :, 1] + prediction[:, :, 3] / 2
    prediction[:, :, :4] = box_corner[:, :, :4]
    output = [None for _ in range(len(prediction))]
    for i, image_pred in enumerate(prediction):
        #----------------------------------------------------------#
        #   對(duì)種類(lèi)預(yù)測(cè)部分取max。
        #   class_conf  [num_anchors, 1]    種類(lèi)置信度
        #   class_pred  [num_anchors, 1]    種類(lèi)
        #----------------------------------------------------------#
        class_conf, class_pred = torch.max(image_pred[:, 5:5 + num_classes], 1, keepdim=True)
        #----------------------------------------------------------#
        #   利用置信度進(jìn)行第一輪篩選
        #----------------------------------------------------------#
        conf_mask = (image_pred[:, 4] * class_conf[:, 0] >= conf_thres).squeeze()
        #----------------------------------------------------------#
        #   根據(jù)置信度進(jìn)行預(yù)測(cè)結(jié)果的篩選
        #----------------------------------------------------------#
        image_pred = image_pred[conf_mask]
        class_conf = class_conf[conf_mask]
        class_pred = class_pred[conf_mask]
        if not image_pred.size(0):
            continue
        #-------------------------------------------------------------------------#
        #   detections  [num_anchors, 7]
        #   7的內(nèi)容為：x1, y1, x2, y2, obj_conf, class_conf, class_pred
        #-------------------------------------------------------------------------#
        detections = torch.cat((image_pred[:, :5], class_conf.float(), class_pred.float()), 1)
        #------------------------------------------#
        #   獲得預(yù)測(cè)結(jié)果中包含的所有種類(lèi)
        #------------------------------------------#
        unique_labels = detections[:, -1].cpu().unique()
        if prediction.is_cuda:
            unique_labels = unique_labels.cuda()
            detections = detections.cuda()
        for c in unique_labels:
            #------------------------------------------#
            #   獲得某一類(lèi)得分篩選后全部的預(yù)測(cè)結(jié)果
            #------------------------------------------#
            detections_class = detections[detections[:, -1] == c]
            #------------------------------------------#
            #   使用官方自帶的非極大抑制會(huì)速度更快一些！
            #------------------------------------------#
            keep = nms(
                detections_class[:, :4],
                detections_class[:, 4] * detections_class[:, 5],
                nms_thres
            )
            max_detections = detections_class[keep]
            # # 按照存在物體的置信度排序
            # _, conf_sort_index = torch.sort(detections_class[:, 4]*detections_class[:, 5], descending=True)
            # detections_class = detections_class[conf_sort_index]
            # # 進(jìn)行非極大抑制
            # max_detections = []
            # while detections_class.size(0):
            #     # 取出這一類(lèi)置信度最高的，一步一步往下判斷，判斷重合程度是否大于nms_thres，如果是則去除掉
            #     max_detections.append(detections_class[0].unsqueeze(0))
            #     if len(detections_class) == 1:
            #         break
            #     ious = bbox_iou(max_detections[-1], detections_class[1:])
            #     detections_class = detections_class[1:][ious < nms_thres]
            # # 堆疊
            # max_detections = torch.cat(max_detections).data
            # Add max detections to outputs
            output[i] = max_detections if output[i] is None else torch.cat((output[i], max_detections))
        if output[i] is not None:
            output[i]           = output[i].cpu().numpy()
            box_xy, box_wh      = (output[i][:, 0:2] + output[i][:, 2:4])/2, output[i][:, 2:4] - output[i][:, 0:2]
            output[i][:, :4]    = self.yolo_correct_boxes(box_xy, box_wh, input_shape, image_shape, letterbox_image)
    return output

四、訓(xùn)練部分

1、計(jì)算loss所需內(nèi)容

計(jì)算loss實(shí)際上是網(wǎng)絡(luò)的預(yù)測(cè)結(jié)果和網(wǎng)絡(luò)的真實(shí)結(jié)果的對(duì)比。

和網(wǎng)絡(luò)的預(yù)測(cè)結(jié)果一樣，網(wǎng)絡(luò)的損失也由三個(gè)部分組成，分別是Reg部分、Obj部分、Cls部分。Reg部分是特征點(diǎn)的回歸參數(shù)判斷、Obj部分是特征點(diǎn)是否包含物體判斷、Cls部分是特征點(diǎn)包含的物體的種類(lèi)。

2、正樣本的匹配過(guò)程

在YoloV5中，訓(xùn)練時(shí)正樣本的匹配過(guò)程可以分為兩部分。

a、匹配先驗(yàn)框。

b、匹配特征點(diǎn)。

所謂正樣本匹配，就是尋找哪些先驗(yàn)框被認(rèn)為有對(duì)應(yīng)的真實(shí)框，并且負(fù)責(zé)這個(gè)真實(shí)框的預(yù)測(cè)。

a、匹配先驗(yàn)框

在YoloV5網(wǎng)絡(luò)中，一共設(shè)計(jì)了9個(gè)不同大小的先驗(yàn)框。每個(gè)輸出的特征層對(duì)應(yīng)3個(gè)先驗(yàn)框。

對(duì)于任何一個(gè)真實(shí)框gt，YoloV5不再使用iou進(jìn)行正樣本的匹配，而是直接采用高寬比進(jìn)行匹配，即使用真實(shí)框和9個(gè)不同大小的先驗(yàn)框計(jì)算寬高比。

如果真實(shí)框與某個(gè)先驗(yàn)框的寬高比例大于設(shè)定閾值，則說(shuō)明該真實(shí)框和該先驗(yàn)框匹配度不夠，將該先驗(yàn)框認(rèn)為是負(fù)樣本。

比如此時(shí)有一個(gè)真實(shí)框，它的寬高為[200, 200]，是一個(gè)正方形。YoloV5默認(rèn)設(shè)置的9個(gè)先驗(yàn)框?yàn)閇10,13], [16,30], [33,23], [30,61], [62,45], [59,119], [116,90], [156,198], [373,326]。設(shè)定閾值門(mén)限為4。

此時(shí)我們需要計(jì)算該真實(shí)框和9個(gè)先驗(yàn)框的寬高比例。比較寬高時(shí)存在兩個(gè)情況，一個(gè)是真實(shí)框的寬高比先驗(yàn)框大，一個(gè)是先驗(yàn)框的寬高比真實(shí)框大。因此我們需要同時(shí)計(jì)算：真實(shí)框的寬高/先驗(yàn)框的寬高；先驗(yàn)框的寬高/真實(shí)框的寬高。然后在這其中選取最大值。

下個(gè)列表就是比較結(jié)果，這是一個(gè)shape為[9, 4]的矩陣，9代表9個(gè)先驗(yàn)框，4代表真實(shí)框的寬高/先驗(yàn)框的寬高；先驗(yàn)框的寬高/真實(shí)框的寬高。

[[20.         15.38461538  0.05        0.065     ]
 [12.5         6.66666667  0.08        0.15      ]
 [ 6.06060606  8.69565217  0.165       0.115     ]
 [ 6.66666667  3.27868852  0.15        0.305     ]
 [ 3.22580645  4.44444444  0.31        0.225     ]
 [ 3.38983051  1.68067227  0.295       0.595     ]
 [ 1.72413793  2.22222222  0.58        0.45      ]
 [ 1.28205128  1.01010101  0.78        0.99      ]
 [ 0.53619303  0.61349693  1.865       1.63      ]]

然后對(duì)每個(gè)先驗(yàn)框的比較結(jié)果取最大值。獲得下述矩陣：

[20.         12.5         8.69565217  6.66666667  4.44444444  3.38983051
   2.22222222  1.28205128  1.865     ]

之后我們判斷，哪些先驗(yàn)框的比較結(jié)果的值小于門(mén)限。可以知道[59,119], [116,90], [156,198], [373,326]四個(gè)先驗(yàn)框均滿足需求。

[116,90], [156,198], [373,326]屬于20,20的特征層。

[59,119]屬于40,40的特征層。

此時(shí)我們已經(jīng)可以判斷哪些大小的先驗(yàn)框可用于該真實(shí)框的預(yù)測(cè)。

b、匹配特征點(diǎn)

在過(guò)去的Yolo系列中，每個(gè)真實(shí)框由其中心點(diǎn)所在的網(wǎng)格內(nèi)的左上角特征點(diǎn)來(lái)負(fù)責(zé)預(yù)測(cè)。

對(duì)于被選中的特征層，首先計(jì)算真實(shí)框落在哪個(gè)網(wǎng)格內(nèi)，此時(shí)該網(wǎng)格左上角特征點(diǎn)便是一個(gè)負(fù)責(zé)預(yù)測(cè)的特征點(diǎn)。

同時(shí)利用四舍五入規(guī)則，找出最近的兩個(gè)網(wǎng)格，將這三個(gè)網(wǎng)格都認(rèn)為是負(fù)責(zé)預(yù)測(cè)該真實(shí)框的。

紅色點(diǎn)表示該真實(shí)框的中心，除了當(dāng)前所處的網(wǎng)格外，其2個(gè)最近的鄰域網(wǎng)格也被選中。從這里就可以發(fā)現(xiàn)預(yù)測(cè)框的XY軸偏移部分的取值范圍不再是0-1，而是0.5-1.5。

找到對(duì)應(yīng)特征點(diǎn)后，對(duì)應(yīng)特征點(diǎn)在a中被選中的先驗(yàn)框負(fù)責(zé)該真實(shí)框的預(yù)測(cè)。

3、計(jì)算Loss

由第一部分可知，YoloV5的損失由三個(gè)部分組成：

1、Reg部分，由第2部分可知道每個(gè)真實(shí)框?qū)?yīng)的先驗(yàn)框，獲取到每個(gè)框?qū)?yīng)的先驗(yàn)框后，取出該先驗(yàn)框?qū)?yīng)的預(yù)測(cè)框，利用真實(shí)框和預(yù)測(cè)框計(jì)算CIOU損失，作為Reg部分的Loss組成。

2、Obj部分，由第2部分可知道每個(gè)真實(shí)框?qū)?yīng)的先驗(yàn)框，所有真實(shí)框?qū)?yīng)的先驗(yàn)框都是正樣本，剩余的先驗(yàn)框均為負(fù)樣本，根據(jù)正負(fù)樣本和特征點(diǎn)的是否包含物體的預(yù)測(cè)結(jié)果計(jì)算交叉熵?fù)p失，作為Obj部分的Loss組成。

3、Cls部分，由第三部分可知道每個(gè)真實(shí)框?qū)?yīng)的先驗(yàn)框，獲取到每個(gè)框?qū)?yīng)的先驗(yàn)框后，取出該先驗(yàn)框的種類(lèi)預(yù)測(cè)結(jié)果，根據(jù)真實(shí)框的種類(lèi)和先驗(yàn)框的種類(lèi)預(yù)測(cè)結(jié)果計(jì)算交叉熵?fù)p失，作為Cls部分的Loss組成。

import torch
import torch.nn as nn
import math
import numpy as np
class YOLOLoss(nn.Module):
    def __init__(self, anchors, num_classes, input_shape, cuda, anchors_mask = [[6,7,8], [3,4,5], [0,1,2]], label_smoothing = 0):
        super(YOLOLoss, self).__init__()
        #-----------------------------------------------------------#
        #   13x13的特征層對(duì)應(yīng)的anchor是[142, 110],[192, 243],[459, 401]
        #   26x26的特征層對(duì)應(yīng)的anchor是[36, 75],[76, 55],[72, 146]
        #   52x52的特征層對(duì)應(yīng)的anchor是[12, 16],[19, 36],[40, 28]
        #-----------------------------------------------------------#
        self.anchors        = anchors
        self.num_classes    = num_classes
        self.bbox_attrs     = 5 + num_classes
        self.input_shape    = input_shape
        self.anchors_mask   = anchors_mask
        self.label_smoothing = label_smoothing
        self.threshold      = 4
        self.balance        = [0.4, 1.0, 4]
        self.box_ratio      = 5
        self.cls_ratio      = 0.5
        self.obj_ratio      = 1
        self.cuda = cuda
    def clip_by_tensor(self, t, t_min, t_max):
        t = t.float()
        result = (t >= t_min).float() * t + (t < t_min).float() * t_min
        result = (result <= t_max).float() * result + (result > t_max).float() * t_max
        return result
    def MSELoss(self, pred, target):
        return torch.pow(pred - target, 2)
    def BCELoss(self, pred, target):
        epsilon = 1e-7
        pred    = self.clip_by_tensor(pred, epsilon, 1.0 - epsilon)
        output  = - target * torch.log(pred) - (1.0 - target) * torch.log(1.0 - pred)
        return output
    def box_giou(self, b1, b2):
        """
        輸入為：
        ----------
        b1: tensor, shape=(batch, feat_w, feat_h, anchor_num, 4), xywh
        b2: tensor, shape=(batch, feat_w, feat_h, anchor_num, 4), xywh
        返回為：
        -------
        giou: tensor, shape=(batch, feat_w, feat_h, anchor_num, 1)
        """
        #----------------------------------------------------#
        #   求出預(yù)測(cè)框左上角右下角
        #----------------------------------------------------#
        b1_xy       = b1[..., :2]
        b1_wh       = b1[..., 2:4]
        b1_wh_half  = b1_wh/2.
        b1_mins     = b1_xy - b1_wh_half
        b1_maxes    = b1_xy + b1_wh_half
        #----------------------------------------------------#
        #   求出真實(shí)框左上角右下角
        #----------------------------------------------------#
        b2_xy       = b2[..., :2]
        b2_wh       = b2[..., 2:4]
        b2_wh_half  = b2_wh/2.
        b2_mins     = b2_xy - b2_wh_half
        b2_maxes    = b2_xy + b2_wh_half
        #----------------------------------------------------#
        #   求真實(shí)框和預(yù)測(cè)框所有的iou
        #----------------------------------------------------#
        intersect_mins  = torch.max(b1_mins, b2_mins)
        intersect_maxes = torch.min(b1_maxes, b2_maxes)
        intersect_wh    = torch.max(intersect_maxes - intersect_mins, torch.zeros_like(intersect_maxes))
        intersect_area  = intersect_wh[..., 0] * intersect_wh[..., 1]
        b1_area         = b1_wh[..., 0] * b1_wh[..., 1]
        b2_area         = b2_wh[..., 0] * b2_wh[..., 1]
        union_area      = b1_area + b2_area - intersect_area
        iou             = intersect_area / union_area
        #----------------------------------------------------#
        #   找到包裹兩個(gè)框的最小框的左上角和右下角
        #----------------------------------------------------#
        enclose_mins    = torch.min(b1_mins, b2_mins)
        enclose_maxes   = torch.max(b1_maxes, b2_maxes)
        enclose_wh      = torch.max(enclose_maxes - enclose_mins, torch.zeros_like(intersect_maxes))
        #----------------------------------------------------#
        #   計(jì)算對(duì)角線距離
        #----------------------------------------------------#
        enclose_area    = enclose_wh[..., 0] * enclose_wh[..., 1]
        giou            = iou - (enclose_area - union_area) / enclose_area
        return giou
    #---------------------------------------------------#
    #   平滑標(biāo)簽
    #---------------------------------------------------#
    def smooth_labels(self, y_true, label_smoothing, num_classes):
        return y_true * (1.0 - label_smoothing) + label_smoothing / num_classes
    def forward(self, l, input, targets=None):
        #----------------------------------------------------#
        #   l 代表使用的是第幾個(gè)有效特征層
        #   input的shape為  bs, 3*(5+num_classes), 13, 13
        #                   bs, 3*(5+num_classes), 26, 26
        #                   bs, 3*(5+num_classes), 52, 52
        #   targets 真實(shí)框的標(biāo)簽情況 [batch_size, num_gt, 5]
        #----------------------------------------------------#
        #--------------------------------#
        #   獲得圖片數(shù)量，特征層的高和寬
        #--------------------------------#
        bs      = input.size(0)
        in_h    = input.size(2)
        in_w    = input.size(3)
        #-----------------------------------------------------------------------#
        #   計(jì)算步長(zhǎng)
        #   每一個(gè)特征點(diǎn)對(duì)應(yīng)原來(lái)的圖片上多少個(gè)像素點(diǎn)
        #   
        #   如果特征層為13x13的話，一個(gè)特征點(diǎn)就對(duì)應(yīng)原來(lái)的圖片上的32個(gè)像素點(diǎn)
        #   如果特征層為26x26的話，一個(gè)特征點(diǎn)就對(duì)應(yīng)原來(lái)的圖片上的16個(gè)像素點(diǎn)
        #   如果特征層為52x52的話，一個(gè)特征點(diǎn)就對(duì)應(yīng)原來(lái)的圖片上的8個(gè)像素點(diǎn)
        #   stride_h = stride_w = 32、16、8
        #-----------------------------------------------------------------------#
        stride_h = self.input_shape[0] / in_h
        stride_w = self.input_shape[1] / in_w
        #-------------------------------------------------#
        #   此時(shí)獲得的scaled_anchors大小是相對(duì)于特征層的
        #-------------------------------------------------#
        scaled_anchors  = [(a_w / stride_w, a_h / stride_h) for a_w, a_h in self.anchors]
        #-----------------------------------------------#
        #   輸入的input一共有三個(gè)，他們的shape分別是
        #   bs, 3 * (5+num_classes), 13, 13 => bs, 3, 5 + num_classes, 13, 13 => batch_size, 3, 13, 13, 5 + num_classes
        #   batch_size, 3, 13, 13, 5 + num_classes
        #   batch_size, 3, 26, 26, 5 + num_classes
        #   batch_size, 3, 52, 52, 5 + num_classes
        #-----------------------------------------------#
        prediction = input.view(bs, len(self.anchors_mask[l]), self.bbox_attrs, in_h, in_w).permute(0, 1, 3, 4, 2).contiguous()
        #-----------------------------------------------#
        #   先驗(yàn)框的中心位置的調(diào)整參數(shù)
        #-----------------------------------------------#
        x = torch.sigmoid(prediction[..., 0])
        y = torch.sigmoid(prediction[..., 1])
        #-----------------------------------------------#
        #   先驗(yàn)框的寬高調(diào)整參數(shù)
        #-----------------------------------------------#
        w = torch.sigmoid(prediction[..., 2]) 
        h = torch.sigmoid(prediction[..., 3]) 
        #-----------------------------------------------#
        #   獲得置信度，是否有物體
        #-----------------------------------------------#
        conf = torch.sigmoid(prediction[..., 4])
        #-----------------------------------------------#
        #   種類(lèi)置信度
        #-----------------------------------------------#
        pred_cls = torch.sigmoid(prediction[..., 5:])
        #-----------------------------------------------#
        #   獲得網(wǎng)絡(luò)應(yīng)該有的預(yù)測(cè)結(jié)果
        #-----------------------------------------------#
        y_true, noobj_mask, box_loss_scale = self.get_target(l, targets, scaled_anchors, in_h, in_w)
        #---------------------------------------------------------------#
        #   將預(yù)測(cè)結(jié)果進(jìn)行解碼，判斷預(yù)測(cè)結(jié)果和真實(shí)值的重合程度
        #   如果重合程度過(guò)大則忽略，因?yàn)檫@些特征點(diǎn)屬于預(yù)測(cè)比較準(zhǔn)確的特征點(diǎn)
        #   作為負(fù)樣本不合適
        #----------------------------------------------------------------#
        pred_boxes = self.get_pred_boxes(l, x, y, h, w, targets, scaled_anchors, in_h, in_w)
        if self.cuda:
            y_true          = y_true.cuda()
            noobj_mask      = noobj_mask.cuda()
            box_loss_scale  = box_loss_scale.cuda()
        #-----------------------------------------------------------#
        #   reshape_y_true[...,2:3]和reshape_y_true[...,3:4]
        #   表示真實(shí)框的寬高，二者均在0-1之間
        #   真實(shí)框越大，比重越小，小框的比重更大。
        #-----------------------------------------------------------#
        box_loss_scale = 2 - box_loss_scale
        #---------------------------------------------------------------#
        #   計(jì)算預(yù)測(cè)結(jié)果和真實(shí)結(jié)果的giou
        #----------------------------------------------------------------#
        giou        = self.box_giou(pred_boxes[y_true[..., 4] == 1], y_true[..., :4][y_true[..., 4] == 1])
        loss_loc    = torch.sum((1 - giou) * box_loss_scale[y_true[..., 4] == 1])
        #-----------------------------------------------------------#
        #   計(jì)算置信度的loss
        #-----------------------------------------------------------#
        loss_conf   = torch.sum(self.BCELoss(conf[y_true[..., 4] == 1], giou.detach().clamp(0))) + \
                      torch.sum(self.BCELoss(conf, y_true[..., 4]) * noobj_mask)
        loss_cls    = torch.sum(self.BCELoss(pred_cls[y_true[..., 4] == 1], self.smooth_labels(y_true[..., 5:][y_true[..., 4] == 1], self.label_smoothing, self.num_classes)))
        loss        = loss_loc * self.box_ratio + loss_conf * self.balance[l] * self.obj_ratio + loss_cls * self.cls_ratio
        num_pos = torch.sum(y_true[..., 4])
        num_pos = torch.max(num_pos, torch.ones_like(num_pos))
        return loss, num_pos
    def get_near_points(self, x, y, i, j):
        sub_x = x - i
        sub_y = y - j
        if sub_x > 0.5 and sub_y > 0.5:
            return [[0, 0], [1, 0], [0, 1]]
        elif sub_x < 0.5 and sub_y > 0.5:
            return [[0, 0], [-1, 0], [0, 1]]
        elif sub_x < 0.5 and sub_y < 0.5:
            return [[0, 0], [-1, 0], [0, -1]]
        else:
            return [[0, 0], [1, 0], [0, -1]]
    def get_target(self, l, targets, anchors, in_h, in_w):
        #-----------------------------------------------------#
        #   計(jì)算一共有多少?gòu)垐D片
        #-----------------------------------------------------#
        bs              = len(targets)
        #-----------------------------------------------------#
        #   用于選取哪些先驗(yàn)框不包含物體
        #-----------------------------------------------------#
        noobj_mask      = torch.ones(bs, len(self.anchors_mask[l]), in_h, in_w, requires_grad = False)
        #-----------------------------------------------------#
        #   讓網(wǎng)絡(luò)更加去關(guān)注小目標(biāo)
        #-----------------------------------------------------#
        box_loss_scale  = torch.zeros(bs, len(self.anchors_mask[l]), in_h, in_w, requires_grad = False)
        #-----------------------------------------------------#
        #   anchors_best_ratio
        #-----------------------------------------------------#
        box_best_ratio = torch.zeros(bs, len(self.anchors_mask[l]), in_h, in_w, requires_grad = False)
        #-----------------------------------------------------#
        #   batch_size, 3, 13, 13, 5 + num_classes
        #-----------------------------------------------------#
        y_true          = torch.zeros(bs, len(self.anchors_mask[l]), in_h, in_w, self.bbox_attrs, requires_grad = False)
        for b in range(bs):            
            if len(targets[b])==0:
                continue
            batch_target = torch.zeros_like(targets[b])
            #-------------------------------------------------------#
            #   計(jì)算出正樣本在特征層上的中心點(diǎn)
            #-------------------------------------------------------#
            batch_target[:, [0,2]] = targets[b][:, [0,2]] * in_w
            batch_target[:, [1,3]] = targets[b][:, [1,3]] * in_h
            batch_target[:, 4] = targets[b][:, 4]
            batch_target = batch_target.cpu()
            #-------------------------------------------------------#
            #   batch_target            : num_true_box, 4
            #   anchors                 : 9, 2
            #
            #   ratios_of_gt_anchors    : num_true_box, 9, 2
            #   ratios_of_anchors_gt    : num_true_box, 9, 2
            #
            #   ratios                  : num_true_box, 9, 4
            #   max_ratios              : num_true_box, 9
            #-------------------------------------------------------#
            ratios_of_gt_anchors = torch.unsqueeze(batch_target[:, 2:4], 1) / torch.unsqueeze(torch.FloatTensor(anchors), 0)
            ratios_of_anchors_gt = torch.unsqueeze(torch.FloatTensor(anchors), 0) /  torch.unsqueeze(batch_target[:, 2:4], 1)
            ratios               = torch.cat([ratios_of_gt_anchors, ratios_of_anchors_gt], dim = -1)
            max_ratios, _        = torch.max(ratios, dim = -1)
            for t, ratio in enumerate(max_ratios):
                #-------------------------------------------------------#
                #   ratio : 9
                #-------------------------------------------------------#
                over_threshold = ratio < self.threshold
                over_threshold[torch.argmin(ratio)] = True
                for k, mask in enumerate(self.anchors_mask[l]):
                    if not over_threshold[mask]:
                        continue
                    #----------------------------------------#
                    #   獲得真實(shí)框?qū)儆谀膫€(gè)網(wǎng)格點(diǎn)
                    #----------------------------------------#
                    i = torch.floor(batch_target[t, 0]).long()
                    j = torch.floor(batch_target[t, 1]).long()
                    offsets = self.get_near_points(batch_target[t, 0], batch_target[t, 1], i, j)
                    for offset in offsets:
                        local_i = i + offset[0]
                        local_j = j + offset[1]
                        if local_i >= in_w or local_i < 0 or local_j >= in_h or local_j < 0:
                            continue
                        if box_best_ratio[b, k, local_j, local_i] != 0:
                            if box_best_ratio[b, k, local_j, local_i] > ratio[mask]:
                                y_true[b, k, local_j, local_i, :] = 0
                            else:
                                continue
                        #----------------------------------------#
                        #   取出真實(shí)框的種類(lèi)
                        #----------------------------------------#
                        c = batch_target[t, 4].long()
                        #----------------------------------------#
                        #   noobj_mask代表無(wú)目標(biāo)的特征點(diǎn)
                        #----------------------------------------#
                        noobj_mask[b, k, local_j, local_i] = 0
                        #----------------------------------------#
                        #   tx、ty代表中心調(diào)整參數(shù)的真實(shí)值
                        #----------------------------------------#
                        y_true[b, k, local_j, local_i, 0] = batch_target[t, 0]
                        y_true[b, k, local_j, local_i, 1] = batch_target[t, 1]
                        y_true[b, k, local_j, local_i, 2] = batch_target[t, 2]
                        y_true[b, k, local_j, local_i, 3] = batch_target[t, 3]
                        y_true[b, k, local_j, local_i, 4] = 1
                        y_true[b, k, local_j, local_i, c + 5] = 1
                        #----------------------------------------#
                        #   用于獲得xywh的比例
                        #   大目標(biāo)loss權(quán)重小，小目標(biāo)loss權(quán)重大
                        #----------------------------------------#
                        box_loss_scale[b, k, local_j, local_i] = batch_target[t, 2] * batch_target[t, 3] / in_w / in_h
                        #----------------------------------------#
                        #   獲得當(dāng)前先驗(yàn)框最好的比例
                        #----------------------------------------#
                        box_best_ratio[b, k, local_j, local_i] = ratio[mask]
        return y_true, noobj_mask, box_loss_scale
    def get_pred_boxes(self, l, x, y, h, w, targets, scaled_anchors, in_h, in_w):
        #-----------------------------------------------------#
        #   計(jì)算一共有多少?gòu)垐D片
        #-----------------------------------------------------#
        bs = len(targets)
        FloatTensor = torch.cuda.FloatTensor if x.is_cuda else torch.FloatTensor
        LongTensor  = torch.cuda.LongTensor if x.is_cuda else torch.LongTensor
        #-----------------------------------------------------#
        #   生成網(wǎng)格，先驗(yàn)框中心，網(wǎng)格左上角
        #-----------------------------------------------------#
        grid_x = torch.linspace(0, in_w - 1, in_w).repeat(in_h, 1).repeat(
            int(bs * len(self.anchors_mask[l])), 1, 1).view(x.shape).type(FloatTensor)
        grid_y = torch.linspace(0, in_h - 1, in_h).repeat(in_w, 1).t().repeat(
            int(bs * len(self.anchors_mask[l])), 1, 1).view(y.shape).type(FloatTensor)
        # 生成先驗(yàn)框的寬高
        scaled_anchors_l = np.array(scaled_anchors)[self.anchors_mask[l]]
        anchor_w = FloatTensor(scaled_anchors_l).index_select(1, LongTensor([0]))
        anchor_h = FloatTensor(scaled_anchors_l).index_select(1, LongTensor([1]))
        anchor_w = anchor_w.repeat(bs, 1).repeat(1, 1, in_h * in_w).view(w.shape)
        anchor_h = anchor_h.repeat(bs, 1).repeat(1, 1, in_h * in_w).view(h.shape)
        #-------------------------------------------------------#
        #   計(jì)算調(diào)整后的先驗(yàn)框中心與寬高
        #-------------------------------------------------------#
        pred_boxes_x    = torch.unsqueeze(x * 2. - 0.5 + grid_x, -1)
        pred_boxes_y    = torch.unsqueeze(y * 2. - 0.5 + grid_y, -1)
        pred_boxes_w    = torch.unsqueeze((w * 2) ** 2 * anchor_w, -1)
        pred_boxes_h    = torch.unsqueeze((h * 2) ** 2 * anchor_h, -1)
        pred_boxes      = torch.cat([pred_boxes_x, pred_boxes_y, pred_boxes_w, pred_boxes_h], dim = -1)
        return pred_boxes

訓(xùn)練自己的YoloV5模型

首先前往Github下載對(duì)應(yīng)的倉(cāng)庫(kù)，下載完后利用解壓軟件解壓，之后用編程軟件打開(kāi)文件夾。

注意打開(kāi)的根目錄必須正確，否則相對(duì)目錄不正確的情況下，代碼將無(wú)法運(yùn)行。

一定要注意打開(kāi)后的根目錄是文件存放的目錄。

一、數(shù)據(jù)集的準(zhǔn)備

本文使用VOC格式進(jìn)行訓(xùn)練，訓(xùn)練前需要自己制作好數(shù)據(jù)集，如果沒(méi)有自己的數(shù)據(jù)集，可以通過(guò)Github連接下載VOC12+07的數(shù)據(jù)集嘗試下。

訓(xùn)練前將標(biāo)簽文件放在VOCdevkit文件夾下的VOC2007文件夾下的Annotation中。

訓(xùn)練前將圖片文件放在VOCdevkit文件夾下的VOC2007文件夾下的JPEGImages中。

此時(shí)數(shù)據(jù)集的擺放已經(jīng)結(jié)束。

二、數(shù)據(jù)集的處理

在完成數(shù)據(jù)集的擺放之后，我們需要對(duì)數(shù)據(jù)集進(jìn)行下一步的處理，目的是獲得訓(xùn)練用的2007_train.txt以及2007_val.txt，需要用到根目錄下的voc_annotation.py。

voc_annotation.py里面有一些參數(shù)需要設(shè)置。

分別是annotation_mode、classes_path、trainval_percent、train_percent、VOCdevkit_path，第一次訓(xùn)練可以僅修改classes_path

'''
annotation_mode用于指定該文件運(yùn)行時(shí)計(jì)算的內(nèi)容
annotation_mode為0代表整個(gè)標(biāo)簽處理過(guò)程，包括獲得VOCdevkit/VOC2007/ImageSets里面的txt以及訓(xùn)練用的2007_train.txt、2007_val.txt
annotation_mode為1代表獲得VOCdevkit/VOC2007/ImageSets里面的txt
annotation_mode為2代表獲得訓(xùn)練用的2007_train.txt、2007_val.txt
'''
annotation_mode     = 0
'''
必須要修改，用于生成2007_train.txt、2007_val.txt的目標(biāo)信息
與訓(xùn)練和預(yù)測(cè)所用的classes_path一致即可
如果生成的2007_train.txt里面沒(méi)有目標(biāo)信息
那么就是因?yàn)閏lasses沒(méi)有設(shè)定正確
僅在annotation_mode為0和2的時(shí)候有效
'''
classes_path        = 'model_data/voc_classes.txt'
'''
trainval_percent用于指定(訓(xùn)練集+驗(yàn)證集)與測(cè)試集的比例，默認(rèn)情況下 (訓(xùn)練集+驗(yàn)證集):測(cè)試集 = 9:1
train_percent用于指定(訓(xùn)練集+驗(yàn)證集)中訓(xùn)練集與驗(yàn)證集的比例，默認(rèn)情況下 訓(xùn)練集:驗(yàn)證集 = 9:1
僅在annotation_mode為0和1的時(shí)候有效
'''
trainval_percent    = 0.9
train_percent       = 0.9
'''
指向VOC數(shù)據(jù)集所在的文件夾
默認(rèn)指向根目錄下的VOC數(shù)據(jù)集
'''
VOCdevkit_path  = 'VOCdevkit'

classes_path用于指向檢測(cè)類(lèi)別所對(duì)應(yīng)的txt，以voc數(shù)據(jù)集為例，我們用的txt為：

訓(xùn)練自己的數(shù)據(jù)集時(shí)，可以自己建立一個(gè)cls_classes.txt，里面寫(xiě)自己所需要區(qū)分的類(lèi)別。

三、開(kāi)始網(wǎng)絡(luò)訓(xùn)練

通過(guò)voc_annotation.py我們已經(jīng)生成了2007_train.txt以及2007_val.txt，此時(shí)我們可以開(kāi)始訓(xùn)練了。

訓(xùn)練的參數(shù)較多，大家可以在下載庫(kù)后仔細(xì)看注釋?zhuān)渲凶钪匾牟糠忠廊皇莟rain.py里的classes_path。

classes_path用于指向檢測(cè)類(lèi)別所對(duì)應(yīng)的txt，這個(gè)txt和voc_annotation.py里面的txt一樣！訓(xùn)練自己的數(shù)據(jù)集必須要修改！

修改完classes_path后就可以運(yùn)行train.py開(kāi)始訓(xùn)練了，在訓(xùn)練多個(gè)epoch后，權(quán)值會(huì)生成在logs文件夾中。

其它參數(shù)的作用如下：

#-------------------------------#
#   是否使用Cuda
#   沒(méi)有GPU可以設(shè)置成False
#-------------------------------#
Cuda = True
#--------------------------------------------------------#
#   訓(xùn)練前一定要修改classes_path，使其對(duì)應(yīng)自己的數(shù)據(jù)集
#--------------------------------------------------------#
classes_path    = 'model_data/voc_classes.txt'
#---------------------------------------------------------------------#
#   anchors_path代表先驗(yàn)框?qū)?yīng)的txt文件，一般不修改。
#   anchors_mask用于幫助代碼找到對(duì)應(yīng)的先驗(yàn)框，一般不修改。
#---------------------------------------------------------------------#
anchors_path    = 'model_data/yolo_anchors.txt'
anchors_mask    = [[6, 7, 8], [3, 4, 5], [0, 1, 2]]
#----------------------------------------------------------------------------------------------------------------------------#
#   權(quán)值文件的下載請(qǐng)看README，可以通過(guò)網(wǎng)盤(pán)下載。模型的 預(yù)訓(xùn)練權(quán)重 對(duì)不同數(shù)據(jù)集是通用的，因?yàn)樘卣魇峭ㄓ玫摹?
#   模型的 預(yù)訓(xùn)練權(quán)重 比較重要的部分是 主干特征提取網(wǎng)絡(luò)的權(quán)值部分，用于進(jìn)行特征提取。
#   預(yù)訓(xùn)練權(quán)重對(duì)于99%的情況都必須要用，不用的話主干部分的權(quán)值太過(guò)隨機(jī)，特征提取效果不明顯，網(wǎng)絡(luò)訓(xùn)練的結(jié)果也不會(huì)好
#
#   如果訓(xùn)練過(guò)程中存在中斷訓(xùn)練的操作，可以將model_path設(shè)置成logs文件夾下的權(quán)值文件，將已經(jīng)訓(xùn)練了一部分的權(quán)值再次載入。
#   同時(shí)修改下方的 凍結(jié)階段 或者 解凍階段 的參數(shù)，來(lái)保證模型epoch的連續(xù)性。
#   
#   當(dāng)model_path = ''的時(shí)候不加載整個(gè)模型的權(quán)值。
#
#   此處使用的是整個(gè)模型的權(quán)重，因此是在train.py進(jìn)行加載的。
#   如果想要讓模型從0開(kāi)始訓(xùn)練，則設(shè)置model_path = ''，下面的Freeze_Train = Fasle，此時(shí)從0開(kāi)始訓(xùn)練，且沒(méi)有凍結(jié)主干的過(guò)程。
#   一般來(lái)講，從0開(kāi)始訓(xùn)練效果會(huì)很差，因?yàn)闄?quán)值太過(guò)隨機(jī)，特征提取效果不明顯。
#
#   網(wǎng)絡(luò)一般不從0開(kāi)始訓(xùn)練，至少會(huì)使用主干部分的權(quán)值，有些論文提到可以不用預(yù)訓(xùn)練，主要原因是他們 數(shù)據(jù)集較大 且 調(diào)參能力優(yōu)秀。
#   如果一定要訓(xùn)練網(wǎng)絡(luò)的主干部分，可以了解imagenet數(shù)據(jù)集，首先訓(xùn)練分類(lèi)模型，分類(lèi)模型的 主干部分 和該模型通用，基于此進(jìn)行訓(xùn)練。
#----------------------------------------------------------------------------------------------------------------------------#
model_path      = 'model_data/yolov5_s.pth'
#------------------------------------------------------#
#   輸入的shape大小，一定要是32的倍數(shù)
#------------------------------------------------------#
input_shape     = [640, 640]
#------------------------------------------------------#
#   所使用的YoloV5的版本。s、m、l、x
#------------------------------------------------------#
phi             = 's'
#------------------------------------------------------#
#   Yolov4的tricks應(yīng)用
#   mosaic 馬賽克數(shù)據(jù)增強(qiáng) True or False 
#   實(shí)際測(cè)試時(shí)mosaic數(shù)據(jù)增強(qiáng)并不穩(wěn)定，所以默認(rèn)為False
#   Cosine_lr 余弦退火學(xué)習(xí)率 True or False
#   label_smoothing 標(biāo)簽平滑 0.01以下一般 如0.01、0.005
#------------------------------------------------------#
mosaic              = False
Cosine_lr           = False
label_smoothing     = 0
#----------------------------------------------------#
#   訓(xùn)練分為兩個(gè)階段，分別是凍結(jié)階段和解凍階段。
#   顯存不足與數(shù)據(jù)集大小無(wú)關(guān)，提示顯存不足請(qǐng)調(diào)小batch_size。
#   受到BatchNorm層影響，batch_size最小為2，不能為1。
#----------------------------------------------------#
#----------------------------------------------------#
#   凍結(jié)階段訓(xùn)練參數(shù)
#   此時(shí)模型的主干被凍結(jié)了，特征提取網(wǎng)絡(luò)不發(fā)生改變
#   占用的顯存較小，僅對(duì)網(wǎng)絡(luò)進(jìn)行微調(diào)
#----------------------------------------------------#
Init_Epoch          = 0
Freeze_Epoch        = 50
Freeze_batch_size   = 16
Freeze_lr           = 1e-3
#----------------------------------------------------#
#   解凍階段訓(xùn)練參數(shù)
#   此時(shí)模型的主干不被凍結(jié)了，特征提取網(wǎng)絡(luò)會(huì)發(fā)生改變
#   占用的顯存較大，網(wǎng)絡(luò)所有的參數(shù)都會(huì)發(fā)生改變
#----------------------------------------------------#
UnFreeze_Epoch      = 100
Unfreeze_batch_size = 8
Unfreeze_lr         = 1e-4
#------------------------------------------------------#
#   是否進(jìn)行凍結(jié)訓(xùn)練，默認(rèn)先凍結(jié)主干訓(xùn)練后解凍訓(xùn)練。
#------------------------------------------------------#
Freeze_Train        = True
#------------------------------------------------------#
#   用于設(shè)置是否使用多線程讀取數(shù)據(jù)
#   開(kāi)啟后會(huì)加快數(shù)據(jù)讀取速度，但是會(huì)占用更多內(nèi)存
#   內(nèi)存較小的電腦可以設(shè)置為2或者0  
#------------------------------------------------------#
num_workers         = 4
#----------------------------------------------------#
#   獲得圖片路徑和標(biāo)簽
#----------------------------------------------------#
train_annotation_path   = '2007_train.txt'
val_annotation_path     = '2007_val.txt'

四、訓(xùn)練結(jié)果預(yù)測(cè)

訓(xùn)練結(jié)果預(yù)測(cè)需要用到兩個(gè)文件，分別是yolo.py和predict.py。

我們首先需要去yolo.py里面修改model_path以及classes_path，這兩個(gè)參數(shù)必須要修改。

model_path指向訓(xùn)練好的權(quán)值文件，在logs文件夾里。

classes_path指向檢測(cè)類(lèi)別所對(duì)應(yīng)的txt。

完成修改后就可以運(yùn)行predict.py進(jìn)行檢測(cè)了。運(yùn)行后輸入圖片路徑即可檢測(cè)，更多關(guān)于Pytorch搭建YoloV5目標(biāo)檢測(cè)的資料請(qǐng)關(guān)注腳本之家其它相關(guān)文章！

您可能感興趣的文章:

欧美bbbwbbbw肥妇,免费乱码人妻系列日韩,一级黄片

軟件下載

源碼下載

軟件編程

網(wǎng)絡(luò)編程

在線工具

數(shù)據(jù)庫(kù)

CMS

常用工具

Pytorch搭建YoloV5目標(biāo)檢測(cè)平臺(tái)實(shí)現(xiàn)過(guò)程

目錄

學(xué)習(xí)前言

源碼下載

YoloV5改進(jìn)的部分（不完全）

一、整體結(jié)構(gòu)解析

二、網(wǎng)絡(luò)結(jié)構(gòu)解析

1、主干網(wǎng)絡(luò)Backbone介紹

2、構(gòu)建FPN特征金字塔進(jìn)行加強(qiáng)特征提取

3、利用Yolo Head獲得預(yù)測(cè)結(jié)果

三、預(yù)測(cè)結(jié)果的解碼

1、獲得預(yù)測(cè)框與得分

2、得分篩選與非極大抑制

四、訓(xùn)練部分

1、計(jì)算loss所需內(nèi)容

2、正樣本的匹配過(guò)程

a、匹配先驗(yàn)框

b、匹配特征點(diǎn)

3、計(jì)算Loss

訓(xùn)練自己的YoloV5模型

一、數(shù)據(jù)集的準(zhǔn)備

二、數(shù)據(jù)集的處理

三、開(kāi)始網(wǎng)絡(luò)訓(xùn)練

四、訓(xùn)練結(jié)果預(yù)測(cè)

相關(guān)文章

最新評(píng)論

大家感興趣的內(nèi)容

最近更新的內(nèi)容

常用在線小工具

Pytorch搭建YoloV5目標(biāo)檢測(cè)平臺(tái)實(shí)現(xiàn)過(guò)程

目錄

學(xué)習(xí)前言

源碼下載

YoloV5改進(jìn)的部分（不完全）

一、整體結(jié)構(gòu)解析

二、網(wǎng)絡(luò)結(jié)構(gòu)解析

1、主干網(wǎng)絡(luò)Backbone介紹

2、構(gòu)建FPN特征金字塔進(jìn)行加強(qiáng)特征提取

3、利用Yolo Head獲得預(yù)測(cè)結(jié)果

三、預(yù)測(cè)結(jié)果的解碼

1、獲得預(yù)測(cè)框與得分

2、得分篩選與非極大抑制

四、訓(xùn)練部分

1、計(jì)算loss所需內(nèi)容

2、正樣本的匹配過(guò)程

a、匹配先驗(yàn)框

b、匹配特征點(diǎn)

3、計(jì)算Loss

訓(xùn)練自己的YoloV5模型

一、數(shù)據(jù)集的準(zhǔn)備

二、數(shù)據(jù)集的處理

三、開(kāi)始網(wǎng)絡(luò)訓(xùn)練

四、訓(xùn)練結(jié)果預(yù)測(cè)

相關(guān)文章

最新評(píng)論

大家感興趣的內(nèi)容

最近更新的內(nèi)容

常用在線小工具

一、整體結(jié)構(gòu)解析

二、網(wǎng)絡(luò)結(jié)構(gòu)解析

2、構(gòu)建FPN特征金字塔進(jìn)行加強(qiáng)特征提取

三、預(yù)測(cè)結(jié)果的解碼

1、獲得預(yù)測(cè)框與得分

2、得分篩選與非極大抑制

四、訓(xùn)練部分

a、匹配先驗(yàn)框

3、計(jì)算Loss

一、數(shù)據(jù)集的準(zhǔn)備

二、數(shù)據(jù)集的處理

三、開(kāi)始網(wǎng)絡(luò)訓(xùn)練