Pytorch搭建YoloV4目標(biāo)檢測(cè)平臺(tái)實(shí)現(xiàn)源碼

更新時(shí)間：2022年05月09日 14:07:31 作者：Bubbliiiing

這篇文章主要為大家介紹了Pytorch搭建YoloV4目標(biāo)檢測(cè)平臺(tái)實(shí)現(xiàn)源碼，有需要的朋友可以借鑒參考下，希望能夠有所幫助，祝大家多多進(jìn)步，早日升職加薪

什么是YOLOV4

YOLOV4是YOLOV3的改進(jìn)版，在YOLOV3的基礎(chǔ)上結(jié)合了非常多的小Tricks。盡管沒(méi)有目標(biāo)檢測(cè)上革命性的改變，但是YOLOV4依然很好的結(jié)合了速度與精度。根據(jù)上圖也可以看出來(lái)，YOLOV4在YOLOV3的基礎(chǔ)上，在FPS不下降的情況下，mAP達(dá)到了44，提高非常明顯。

YOLOV4整體上的檢測(cè)思路和YOLOV3相比相差并不大，都是使用三個(gè)特征層進(jìn)行分類與回歸預(yù)測(cè)。

請(qǐng)注意！

強(qiáng)烈建議在學(xué)習(xí)YOLOV4之前學(xué)習(xí)YOLOV3，因?yàn)閅OLOV4確實(shí)可以看作是YOLOV3結(jié)合一系列改進(jìn)的版本！

（重要的事情說(shuō)三遍?。?/p>

YOLOV3可參考該博客：http://www.dbjr.com.cn/article/247364.htm

代碼下載

YOLOV4改進(jìn)的部分（不完全）

1、主干特征提取網(wǎng)絡(luò)：DarkNet53 => CSPDarkNet53

2、特征金字塔：SPP，PAN

3、分類回歸層：YOLOv3（未改變）

4、訓(xùn)練用到的小技巧：Mosaic數(shù)據(jù)增強(qiáng)、Label Smoothing平滑、CIOU、學(xué)習(xí)率余弦退火衰減

5、激活函數(shù)：使用Mish激活函數(shù)

以上并非全部的改進(jìn)部分，還存在一些其它的改進(jìn)，由于YOLOV4使用的改進(jìn)實(shí)在太多了，很難完全實(shí)現(xiàn)與列出來(lái)，這里只列出來(lái)了一些我比較感興趣，而且非常有效的改進(jìn)。

還有一個(gè)重要的事情：

論文中提到的SAM，作者自己的源碼也沒(méi)有使用。

還有其它很多的tricks，不是所有的tricks都有提升，我也沒(méi)法實(shí)現(xiàn)全部的tricks。

整篇BLOG會(huì)結(jié)合YOLOV3與YOLOV4的差別進(jìn)行解析

YOLOV4結(jié)構(gòu)解析

為方便理解，本文將所有通道數(shù)都放到了最后一維度。

1、主干特征提取網(wǎng)絡(luò)Backbone

當(dāng)輸入是416x416時(shí)，特征結(jié)構(gòu)如下：

當(dāng)輸入是608x608時(shí)，特征結(jié)構(gòu)如下：

主干特征提取網(wǎng)絡(luò)Backbone的改進(jìn)點(diǎn)有兩個(gè)：

a).主干特征提取網(wǎng)絡(luò)：DarkNet53 => CSPDarkNet53

b).激活函數(shù)：使用Mish激活函數(shù)

如果大家對(duì)YOLOV3比較熟悉的話，應(yīng)該知道Darknet53的結(jié)構(gòu)，其由一系列殘差網(wǎng)絡(luò)結(jié)構(gòu)構(gòu)成。在Darknet53中，其存在resblock_body模塊，其由一次下采樣和多次殘差結(jié)構(gòu)的堆疊構(gòu)成，Darknet53便是由resblock_body模塊組合而成。

而在YOLOV4中，其對(duì)該部分進(jìn)行了一定的修改。

1、其一是將DarknetConv2D的激活函數(shù)由LeakyReLU修改成了Mish，卷積塊由DarknetConv2D_BN_Leaky變成了DarknetConv2D_BN_Mish。

Mish函數(shù)的公式與圖像如下：

2、其二是將resblock_body的結(jié)構(gòu)進(jìn)行修改，使用了CSPnet結(jié)構(gòu)。此時(shí)YOLOV4當(dāng)中的Darknet53被修改成了CSPDarknet53。

CSPnet結(jié)構(gòu)并不算復(fù)雜，就是將原來(lái)的殘差塊的堆疊進(jìn)行了一個(gè)拆分，拆成左右兩部分：

主干部分繼續(xù)進(jìn)行原來(lái)的殘差塊的堆疊；

另一部分則像一個(gè)殘差邊一樣，經(jīng)過(guò)少量處理直接連接到最后。

因此可以認(rèn)為CSP中存在一個(gè)大的殘差邊。

#---------------------------------------------------#
#   CSPdarknet的結(jié)構(gòu)塊
#   存在一個(gè)大殘差邊
#   這個(gè)大殘差邊繞過(guò)了很多的殘差結(jié)構(gòu)
#---------------------------------------------------#
class Resblock_body(nn.Module):
    def __init__(self, in_channels, out_channels, num_blocks, first):
        super(Resblock_body, self).__init__()
        self.downsample_conv = BasicConv(in_channels, out_channels, 3, stride=2)
        if first:
            self.split_conv0 = BasicConv(out_channels, out_channels, 1)
            self.split_conv1 = BasicConv(out_channels, out_channels, 1)  
            self.blocks_conv = nn.Sequential(
                Resblock(channels=out_channels, hidden_channels=out_channels//2),
                BasicConv(out_channels, out_channels, 1)
            )
            self.concat_conv = BasicConv(out_channels*2, out_channels, 1)
        else:
            self.split_conv0 = BasicConv(out_channels, out_channels//2, 1)
            self.split_conv1 = BasicConv(out_channels, out_channels//2, 1)
            self.blocks_conv = nn.Sequential(
                *[Resblock(out_channels//2) for _ in range(num_blocks)],
                BasicConv(out_channels//2, out_channels//2, 1)
            )
            self.concat_conv = BasicConv(out_channels, out_channels, 1)
    def forward(self, x):
        x = self.downsample_conv(x)
        x0 = self.split_conv0(x)
        x1 = self.split_conv1(x)
        x1 = self.blocks_conv(x1)
        x = torch.cat([x1, x0], dim=1)
        x = self.concat_conv(x)
        return x

全部實(shí)現(xiàn)代碼為：

import torch
import torch.nn.functional as F
import torch.nn as nn
import math
from collections import OrderedDict
#-------------------------------------------------#
#   MISH激活函數(shù)
#-------------------------------------------------#
class Mish(nn.Module):
    def __init__(self):
        super(Mish, self).__init__()
    def forward(self, x):
        return x * torch.tanh(F.softplus(x))
#-------------------------------------------------#
#   卷積塊
#   CONV+BATCHNORM+MISH
#-------------------------------------------------#
class BasicConv(nn.Module):
    def __init__(self, in_channels, out_channels, kernel_size, stride=1):
        super(BasicConv, self).__init__()
        self.conv = nn.Conv2d(in_channels, out_channels, kernel_size, stride, kernel_size//2, bias=False)
        self.bn = nn.BatchNorm2d(out_channels)
        self.activation = Mish()
    def forward(self, x):
        x = self.conv(x)
        x = self.bn(x)
        x = self.activation(x)
        return x
#---------------------------------------------------#
#   CSPdarknet的結(jié)構(gòu)塊的組成部分
#   內(nèi)部堆疊的殘差塊
#---------------------------------------------------#
class Resblock(nn.Module):
    def __init__(self, channels, hidden_channels=None, residual_activation=nn.Identity()):
        super(Resblock, self).__init__()
        if hidden_channels is None:
            hidden_channels = channels
        self.block = nn.Sequential(
            BasicConv(channels, hidden_channels, 1),
            BasicConv(hidden_channels, channels, 3)
        )
    def forward(self, x):
        return x+self.block(x)
#---------------------------------------------------#
#   CSPdarknet的結(jié)構(gòu)塊
#   存在一個(gè)大殘差邊
#   這個(gè)大殘差邊繞過(guò)了很多的殘差結(jié)構(gòu)
#---------------------------------------------------#
class Resblock_body(nn.Module):
    def __init__(self, in_channels, out_channels, num_blocks, first):
        super(Resblock_body, self).__init__()
        self.downsample_conv = BasicConv(in_channels, out_channels, 3, stride=2)
        if first:
            self.split_conv0 = BasicConv(out_channels, out_channels, 1)
            self.split_conv1 = BasicConv(out_channels, out_channels, 1)  
            self.blocks_conv = nn.Sequential(
                Resblock(channels=out_channels, hidden_channels=out_channels//2),
                BasicConv(out_channels, out_channels, 1)
            )
            self.concat_conv = BasicConv(out_channels*2, out_channels, 1)
        else:
            self.split_conv0 = BasicConv(out_channels, out_channels//2, 1)
            self.split_conv1 = BasicConv(out_channels, out_channels//2, 1)
            self.blocks_conv = nn.Sequential(
                *[Resblock(out_channels//2) for _ in range(num_blocks)],
                BasicConv(out_channels//2, out_channels//2, 1)
            )
            self.concat_conv = BasicConv(out_channels, out_channels, 1)
    def forward(self, x):
        x = self.downsample_conv(x)
        x0 = self.split_conv0(x)
        x1 = self.split_conv1(x)
        x1 = self.blocks_conv(x1)
        x = torch.cat([x1, x0], dim=1)
        x = self.concat_conv(x)
        return x
class CSPDarkNet(nn.Module):
    def __init__(self, layers):
        super(CSPDarkNet, self).__init__()
        self.inplanes = 32
        self.conv1 = BasicConv(3, self.inplanes, kernel_size=3, stride=1)
        self.feature_channels = [64, 128, 256, 512, 1024]
        self.stages = nn.ModuleList([
            Resblock_body(self.inplanes, self.feature_channels[0], layers[0], first=True),
            Resblock_body(self.feature_channels[0], self.feature_channels[1], layers[1], first=False),
            Resblock_body(self.feature_channels[1], self.feature_channels[2], layers[2], first=False),
            Resblock_body(self.feature_channels[2], self.feature_channels[3], layers[3], first=False),
            Resblock_body(self.feature_channels[3], self.feature_channels[4], layers[4], first=False)
        ])
        self.num_features = 1
        # 進(jìn)行權(quán)值初始化
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
                m.weight.data.normal_(0, math.sqrt(2. / n))
            elif isinstance(m, nn.BatchNorm2d):
                m.weight.data.fill_(1)
                m.bias.data.zero_()
    def forward(self, x):
        x = self.conv1(x)
        x = self.stages[0](x)
        x = self.stages[1](x)
        out3 = self.stages[2](x)
        out4 = self.stages[3](out3)
        out5 = self.stages[4](out4)
        return out3, out4, out5
def darknet53(pretrained, **kwargs):
    model = CSPDarkNet([1, 2, 8, 8, 4])
    if pretrained:
        if isinstance(pretrained, str):
            model.load_state_dict(torch.load(pretrained))
        else:
            raise Exception("darknet request a pretrained path. got [{}]".format(pretrained))
    return model

2、特征金字塔

當(dāng)輸入是416x416時(shí)，特征結(jié)構(gòu)如下：

當(dāng)輸入是608x608時(shí)，特征結(jié)構(gòu)如下：

在特征金字塔部分，YOLOV4結(jié)合了兩種改進(jìn):

a).使用了SPP結(jié)構(gòu)。

b).使用了PANet結(jié)構(gòu)。

如上圖所示，除去CSPDarknet53和Yolo Head的結(jié)構(gòu)外，都是特征金字塔的結(jié)構(gòu)。

1、SPP結(jié)構(gòu)參雜在對(duì)CSPdarknet53的最后一個(gè)特征層的卷積里，在對(duì)CSPdarknet53的最后一個(gè)特征層進(jìn)行三次DarknetConv2D_BN_Leaky卷積后，分別利用四個(gè)不同尺度的最大池化進(jìn)行處理，最大池化的池化核大小分別為13x13、9x9、5x5、1x1（1x1即無(wú)處理）

#---------------------------------------------------#
#   SPP結(jié)構(gòu)，利用不同大小的池化核進(jìn)行池化
#   池化后堆疊
#---------------------------------------------------#
class SpatialPyramidPooling(nn.Module):
    def __init__(self, pool_sizes=[5, 9, 13]):
        super(SpatialPyramidPooling, self).__init__()
        self.maxpools = nn.ModuleList([nn.MaxPool2d(pool_size, 1, pool_size//2) for pool_size in pool_sizes])
    def forward(self, x):
        features = [maxpool(x) for maxpool in self.maxpools[::-1]]
        features = torch.cat(features + [x], dim=1)
        return features

其可以它能夠極大地增加感受野，分離出最顯著的上下文特征。

2、PANet是2018的一種實(shí)例分割算法，其具體結(jié)構(gòu)由反復(fù)提升特征的意思。

上圖為原始的PANet的結(jié)構(gòu)，可以看出來(lái)其具有一個(gè)非常重要的特點(diǎn)就是特征的反復(fù)提取。

在（a）里面是傳統(tǒng)的特征金字塔結(jié)構(gòu)，在完成特征金字塔從下到上的特征提取后，還需要實(shí)現(xiàn)（b）中從上到下的特征提取。

而在YOLOV4當(dāng)中，其主要是在三個(gè)有效特征層上使用了PANet結(jié)構(gòu)。

實(shí)現(xiàn)代碼如下：

#---------------------------------------------------#
#   yolo_body
#---------------------------------------------------#
class YoloBody(nn.Module):
    def __init__(self, config):
        super(YoloBody, self).__init__()
        self.config = config
        #  backbone
        self.backbone = darknet53(None)
        self.conv1 = make_three_conv([512,1024],1024)
        self.SPP = SpatialPyramidPooling()
        self.conv2 = make_three_conv([512,1024],2048)
        self.upsample1 = Upsample(512,256)
        self.conv_for_P4 = conv2d(512,256,1)
        self.make_five_conv1 = make_five_conv([256, 512],512)
        self.upsample2 = Upsample(256,128)
        self.conv_for_P3 = conv2d(256,128,1)
        self.make_five_conv2 = make_five_conv([128, 256],256)
        # 3*(5+num_classes)=3*(5+20)=3*(4+1+20)=75
        final_out_filter2 = len(config["yolo"]["anchors"][2]) * (5 + config["yolo"]["classes"])
        self.yolo_head3 = yolo_head([256, final_out_filter2],128)
        self.down_sample1 = conv2d(128,256,3,stride=2)
        self.make_five_conv3 = make_five_conv([256, 512],512)
        # 3*(5+num_classes)=3*(5+20)=3*(4+1+20)=75
        final_out_filter1 = len(config["yolo"]["anchors"][1]) * (5 + config["yolo"]["classes"])
        self.yolo_head2 = yolo_head([512, final_out_filter1],256)
        self.down_sample2 = conv2d(256,512,3,stride=2)
        self.make_five_conv4 = make_five_conv([512, 1024],1024)
        # 3*(5+num_classes)=3*(5+20)=3*(4+1+20)=75
        final_out_filter0 = len(config["yolo"]["anchors"][0]) * (5 + config["yolo"]["classes"])
        self.yolo_head1 = yolo_head([1024, final_out_filter0],512)
    def forward(self, x):
        #  backbone
        x2, x1, x0 = self.backbone(x)
        P5 = self.conv1(x0)
        P5 = self.SPP(P5)
        P5 = self.conv2(P5)
        P5_upsample = self.upsample1(P5)
        P4 = self.conv_for_P4(x1)
        P4 = torch.cat([P4,P5_upsample],axis=1)
        P4 = self.make_five_conv1(P4)
        P4_upsample = self.upsample2(P4)
        P3 = self.conv_for_P3(x2)
        P3 = torch.cat([P3,P4_upsample],axis=1)
        P3 = self.make_five_conv2(P3)
        P3_downsample = self.down_sample1(P3)
        P4 = torch.cat([P3_downsample,P4],axis=1)
        P4 = self.make_five_conv3(P4)
        P4_downsample = self.down_sample2(P4)
        P5 = torch.cat([P4_downsample,P5],axis=1)
        P5 = self.make_five_conv4(P5)
        out2 = self.yolo_head3(P3)
        out1 = self.yolo_head2(P4)
        out0 = self.yolo_head1(P5)
        return out0, out1, out2

3、YoloHead利用獲得到的特征進(jìn)行預(yù)測(cè)

當(dāng)輸入是416x416時(shí)，特征結(jié)構(gòu)如下：

當(dāng)輸入是608x608時(shí)，特征結(jié)構(gòu)如下：

1、在特征利用部分，YoloV4提取多特征層進(jìn)行目標(biāo)檢測(cè)，一共提取三個(gè)特征層，分別位于中間層，中下層，底層，三個(gè)特征層的shape分別為(76,76,256)、(38,38,512)、(19,19,1024)。

2、輸出層的shape分別為(19,19,75)，(38,38,75)，(76,76,75)，最后一個(gè)維度為75是因?yàn)樵搱D是基于voc數(shù)據(jù)集的，它的類為20種，YoloV4只有針對(duì)每一個(gè)特征層存在3個(gè)先驗(yàn)框，所以最后維度為3x25；

如果使用的是coco訓(xùn)練集，類則為80種，最后的維度應(yīng)該為255 = 3x85，三個(gè)特征層的shape為(19,19,255)，(38,38,255)，(76,76,255)

實(shí)現(xiàn)代碼如下：

#---------------------------------------------------#
#   最后獲得yolov4的輸出
#---------------------------------------------------#
def yolo_head(filters_list, in_filters):
    m = nn.Sequential(
        conv2d(in_filters, filters_list[0], 3),
        nn.Conv2d(filters_list[0], filters_list[1], 1),
    )
    return m
#---------------------------------------------------#
#   yolo_body
#---------------------------------------------------#
class YoloBody(nn.Module):
    def __init__(self, config):
        super(YoloBody, self).__init__()
        self.config = config
        #  backbone
        self.backbone = darknet53(None)
        self.conv1 = make_three_conv([512,1024],1024)
        self.SPP = SpatialPyramidPooling()
        self.conv2 = make_three_conv([512,1024],2048)
        self.upsample1 = Upsample(512,256)
        self.conv_for_P4 = conv2d(512,256,1)
        self.make_five_conv1 = make_five_conv([256, 512],512)
        self.upsample2 = Upsample(256,128)
        self.conv_for_P3 = conv2d(256,128,1)
        self.make_five_conv2 = make_five_conv([128, 256],256)
        # 3*(5+num_classes)=3*(5+20)=3*(4+1+20)=75
        final_out_filter2 = len(config["yolo"]["anchors"][2]) * (5 + config["yolo"]["classes"])
        self.yolo_head3 = yolo_head([256, final_out_filter2],128)
        self.down_sample1 = conv2d(128,256,3,stride=2)
        self.make_five_conv3 = make_five_conv([256, 512],512)
        # 3*(5+num_classes)=3*(5+20)=3*(4+1+20)=75
        final_out_filter1 = len(config["yolo"]["anchors"][1]) * (5 + config["yolo"]["classes"])
        self.yolo_head2 = yolo_head([512, final_out_filter1],256)
        self.down_sample2 = conv2d(256,512,3,stride=2)
        self.make_five_conv4 = make_five_conv([512, 1024],1024)
        # 3*(5+num_classes)=3*(5+20)=3*(4+1+20)=75
        final_out_filter0 = len(config["yolo"]["anchors"][0]) * (5 + config["yolo"]["classes"])
        self.yolo_head1 = yolo_head([1024, final_out_filter0],512)
    def forward(self, x):
        #  backbone
        x2, x1, x0 = self.backbone(x)
        P5 = self.conv1(x0)
        P5 = self.SPP(P5)
        P5 = self.conv2(P5)
        P5_upsample = self.upsample1(P5)
        P4 = self.conv_for_P4(x1)
        P4 = torch.cat([P4,P5_upsample],axis=1)
        P4 = self.make_five_conv1(P4)
        P4_upsample = self.upsample2(P4)
        P3 = self.conv_for_P3(x2)
        P3 = torch.cat([P3,P4_upsample],axis=1)
        P3 = self.make_five_conv2(P3)
        P3_downsample = self.down_sample1(P3)
        P4 = torch.cat([P3_downsample,P4],axis=1)
        P4 = self.make_five_conv3(P4)
        P4_downsample = self.down_sample2(P4)
        P5 = torch.cat([P4_downsample,P5],axis=1)
        P5 = self.make_five_conv4(P5)
        out2 = self.yolo_head3(P3)
        out1 = self.yolo_head2(P4)
        out0 = self.yolo_head1(P5)
        return out0, out1, out2

4、預(yù)測(cè)結(jié)果的解碼

由第二步我們可以獲得三個(gè)特征層的預(yù)測(cè)結(jié)果，shape分別為(N,19,19,255)，(N,38,38,255)，(N,76,76,255)的數(shù)據(jù)，對(duì)應(yīng)每個(gè)圖分為19x19、38x38、76x76的網(wǎng)格上3個(gè)預(yù)測(cè)框的位置。

但是這個(gè)預(yù)測(cè)結(jié)果并不對(duì)應(yīng)著最終的預(yù)測(cè)框在圖片上的位置，還需要解碼才可以完成。

此處要講一下yolo3的預(yù)測(cè)原理，yolo3的3個(gè)特征層分別將整幅圖分為19x19、38x38、76x76的網(wǎng)格，每個(gè)網(wǎng)絡(luò)點(diǎn)負(fù)責(zé)一個(gè)區(qū)域的檢測(cè)。

我們知道特征層的預(yù)測(cè)結(jié)果對(duì)應(yīng)著三個(gè)預(yù)測(cè)框的位置，我們先將其reshape一下，其結(jié)果為(N,19,19,3,85)，(N,38,38,3,85)，(N,76,76,3,85)。

最后一個(gè)維度中的85包含了4+1+80，分別代表x_offset、y_offset、h和w、置信度、分類結(jié)果。

yolo3的解碼過(guò)程就是將每個(gè)網(wǎng)格點(diǎn)加上它對(duì)應(yīng)的x_offset和y_offset，加完后的結(jié)果就是預(yù)測(cè)框的中心，然后再利用先驗(yàn)框和h、w結(jié)合計(jì)算出預(yù)測(cè)框的長(zhǎng)和寬。這樣就能得到整個(gè)預(yù)測(cè)框的位置了。

當(dāng)然得到最終的預(yù)測(cè)結(jié)構(gòu)后還要進(jìn)行得分排序與非極大抑制篩選這一部分基本上是所有目標(biāo)檢測(cè)通用的部分。不過(guò)該項(xiàng)目的處理方式與其它項(xiàng)目不同。其對(duì)于每一個(gè)類進(jìn)行判別。

1、取出每一類得分大于self.obj_threshold的框和得分。

2、利用框的位置和得分進(jìn)行非極大抑制。

實(shí)現(xiàn)代碼如下，當(dāng)調(diào)用yolo_eval時(shí)，就會(huì)對(duì)每個(gè)特征層進(jìn)行解碼：

import torch
import torch.nn as nn
from torchvision.ops import nms
import numpy as np
class DecodeBox():
    def __init__(self, anchors, num_classes, input_shape, anchors_mask = [[6,7,8], [3,4,5], [0,1,2]]):
        super(DecodeBox, self).__init__()
        self.anchors        = anchors
        self.num_classes    = num_classes
        self.bbox_attrs     = 5 + num_classes
        self.input_shape    = input_shape
        #-----------------------------------------------------------#
        #   13x13的特征層對(duì)應(yīng)的anchor是[142, 110],[192, 243],[459, 401]
        #   26x26的特征層對(duì)應(yīng)的anchor是[36, 75],[76, 55],[72, 146]
        #   52x52的特征層對(duì)應(yīng)的anchor是[12, 16],[19, 36],[40, 28]
        #-----------------------------------------------------------#
        self.anchors_mask   = anchors_mask
    def decode_box(self, inputs):
        outputs = []
        for i, input in enumerate(inputs):
            #-----------------------------------------------#
            #   輸入的input一共有三個(gè)，他們的shape分別是
            #   batch_size, 255, 13, 13
            #   batch_size, 255, 26, 26
            #   batch_size, 255, 52, 52
            #-----------------------------------------------#
            batch_size      = input.size(0)
            input_height    = input.size(2)
            input_width     = input.size(3)
            #-----------------------------------------------#
            #   輸入為416x416時(shí)
            #   stride_h = stride_w = 32、16、8
            #-----------------------------------------------#
            stride_h = self.input_shape[0] / input_height
            stride_w = self.input_shape[1] / input_width
            #-------------------------------------------------#
            #   此時(shí)獲得的scaled_anchors大小是相對(duì)于特征層的
            #-------------------------------------------------#
            scaled_anchors = [(anchor_width / stride_w, anchor_height / stride_h) for anchor_width, anchor_height in self.anchors[self.anchors_mask[i]]]
            #-----------------------------------------------#
            #   輸入的input一共有三個(gè)，他們的shape分別是
            #   batch_size, 3, 13, 13, 85
            #   batch_size, 3, 26, 26, 85
            #   batch_size, 3, 52, 52, 85
            #-----------------------------------------------#
            prediction = input.view(batch_size, len(self.anchors_mask[i]),
                                    self.bbox_attrs, input_height, input_width).permute(0, 1, 3, 4, 2).contiguous()
            #-----------------------------------------------#
            #   先驗(yàn)框的中心位置的調(diào)整參數(shù)
            #-----------------------------------------------#
            x = torch.sigmoid(prediction[..., 0])  
            y = torch.sigmoid(prediction[..., 1])
            #-----------------------------------------------#
            #   先驗(yàn)框的寬高調(diào)整參數(shù)
            #-----------------------------------------------#
            w = prediction[..., 2]
            h = prediction[..., 3]
            #-----------------------------------------------#
            #   獲得置信度，是否有物體
            #-----------------------------------------------#
            conf        = torch.sigmoid(prediction[..., 4])
            #-----------------------------------------------#
            #   種類置信度
            #-----------------------------------------------#
            pred_cls    = torch.sigmoid(prediction[..., 5:])
            FloatTensor = torch.cuda.FloatTensor if x.is_cuda else torch.FloatTensor
            LongTensor  = torch.cuda.LongTensor if x.is_cuda else torch.LongTensor
            #----------------------------------------------------------#
            #   生成網(wǎng)格，先驗(yàn)框中心，網(wǎng)格左上角 
            #   batch_size,3,13,13
            #----------------------------------------------------------#
            grid_x = torch.linspace(0, input_width - 1, input_width).repeat(input_height, 1).repeat(
                batch_size * len(self.anchors_mask[i]), 1, 1).view(x.shape).type(FloatTensor)
            grid_y = torch.linspace(0, input_height - 1, input_height).repeat(input_width, 1).t().repeat(
                batch_size * len(self.anchors_mask[i]), 1, 1).view(y.shape).type(FloatTensor)
            #----------------------------------------------------------#
            #   按照網(wǎng)格格式生成先驗(yàn)框的寬高
            #   batch_size,3,13,13
            #----------------------------------------------------------#
            anchor_w = FloatTensor(scaled_anchors).index_select(1, LongTensor([0]))
            anchor_h = FloatTensor(scaled_anchors).index_select(1, LongTensor([1]))
            anchor_w = anchor_w.repeat(batch_size, 1).repeat(1, 1, input_height * input_width).view(w.shape)
            anchor_h = anchor_h.repeat(batch_size, 1).repeat(1, 1, input_height * input_width).view(h.shape)
            #----------------------------------------------------------#
            #   利用預(yù)測(cè)結(jié)果對(duì)先驗(yàn)框進(jìn)行調(diào)整
            #   首先調(diào)整先驗(yàn)框的中心，從先驗(yàn)框中心向右下角偏移
            #   再調(diào)整先驗(yàn)框的寬高。
            #----------------------------------------------------------#
            pred_boxes          = FloatTensor(prediction[..., :4].shape)
            pred_boxes[..., 0]  = x.data + grid_x
            pred_boxes[..., 1]  = y.data + grid_y
            pred_boxes[..., 2]  = torch.exp(w.data) * anchor_w
            pred_boxes[..., 3]  = torch.exp(h.data) * anchor_h
            #----------------------------------------------------------#
            #   將輸出結(jié)果歸一化成小數(shù)的形式
            #----------------------------------------------------------#
            _scale = torch.Tensor([input_width, input_height, input_width, input_height]).type(FloatTensor)
            output = torch.cat((pred_boxes.view(batch_size, -1, 4) / _scale,
                                conf.view(batch_size, -1, 1), pred_cls.view(batch_size, -1, self.num_classes)), -1)
            outputs.append(output.data)
        return outputs
    def yolo_correct_boxes(self, box_xy, box_wh, input_shape, image_shape, letterbox_image):
        #-----------------------------------------------------------------#
        #   把y軸放前面是因?yàn)榉奖泐A(yù)測(cè)框和圖像的寬高進(jìn)行相乘
        #-----------------------------------------------------------------#
        box_yx = box_xy[..., ::-1]
        box_hw = box_wh[..., ::-1]
        input_shape = np.array(input_shape)
        image_shape = np.array(image_shape)
        if letterbox_image:
            #-----------------------------------------------------------------#
            #   這里求出來(lái)的offset是圖像有效區(qū)域相對(duì)于圖像左上角的偏移情況
            #   new_shape指的是寬高縮放情況
            #-----------------------------------------------------------------#
            new_shape = np.round(image_shape * np.min(input_shape/image_shape))
            offset  = (input_shape - new_shape)/2./input_shape
            scale   = input_shape/new_shape
            box_yx  = (box_yx - offset) * scale
            box_hw *= scale
        box_mins    = box_yx - (box_hw / 2.)
        box_maxes   = box_yx + (box_hw / 2.)
        boxes  = np.concatenate([box_mins[..., 0:1], box_mins[..., 1:2], box_maxes[..., 0:1], box_maxes[..., 1:2]], axis=-1)
        boxes *= np.concatenate([image_shape, image_shape], axis=-1)
        return boxes
    def non_max_suppression(self, prediction, num_classes, input_shape, image_shape, letterbox_image, conf_thres=0.5, nms_thres=0.4):
        #----------------------------------------------------------#
        #   將預(yù)測(cè)結(jié)果的格式轉(zhuǎn)換成左上角右下角的格式。
        #   prediction  [batch_size, num_anchors, 85]
        #----------------------------------------------------------#
        box_corner          = prediction.new(prediction.shape)
        box_corner[:, :, 0] = prediction[:, :, 0] - prediction[:, :, 2] / 2
        box_corner[:, :, 1] = prediction[:, :, 1] - prediction[:, :, 3] / 2
        box_corner[:, :, 2] = prediction[:, :, 0] + prediction[:, :, 2] / 2
        box_corner[:, :, 3] = prediction[:, :, 1] + prediction[:, :, 3] / 2
        prediction[:, :, :4] = box_corner[:, :, :4]
        output = [None for _ in range(len(prediction))]
        for i, image_pred in enumerate(prediction):
            #----------------------------------------------------------#
            #   對(duì)種類預(yù)測(cè)部分取max。
            #   class_conf  [num_anchors, 1]    種類置信度
            #   class_pred  [num_anchors, 1]    種類
            #----------------------------------------------------------#
            class_conf, class_pred = torch.max(image_pred[:, 5:5 + num_classes], 1, keepdim=True)
            #----------------------------------------------------------#
            #   利用置信度進(jìn)行第一輪篩選
            #----------------------------------------------------------#
            conf_mask = (image_pred[:, 4] * class_conf[:, 0] >= conf_thres).squeeze()
            #----------------------------------------------------------#
            #   根據(jù)置信度進(jìn)行預(yù)測(cè)結(jié)果的篩選
            #----------------------------------------------------------#
            image_pred = image_pred[conf_mask]
            class_conf = class_conf[conf_mask]
            class_pred = class_pred[conf_mask]
            if not image_pred.size(0):
                continue
            #-------------------------------------------------------------------------#
            #   detections  [num_anchors, 7]
            #   7的內(nèi)容為：x1, y1, x2, y2, obj_conf, class_conf, class_pred
            #-------------------------------------------------------------------------#
            detections = torch.cat((image_pred[:, :5], class_conf.float(), class_pred.float()), 1)
            #------------------------------------------#
            #   獲得預(yù)測(cè)結(jié)果中包含的所有種類
            #------------------------------------------#
            unique_labels = detections[:, -1].cpu().unique()
            if prediction.is_cuda:
                unique_labels = unique_labels.cuda()
                detections = detections.cuda()
            for c in unique_labels:
                #------------------------------------------#
                #   獲得某一類得分篩選后全部的預(yù)測(cè)結(jié)果
                #------------------------------------------#
                detections_class = detections[detections[:, -1] == c]
                #------------------------------------------#
                #   使用官方自帶的非極大抑制會(huì)速度更快一些！
                #------------------------------------------#
                keep = nms(
                    detections_class[:, :4],
                    detections_class[:, 4] * detections_class[:, 5],
                    nms_thres
                )
                max_detections = detections_class[keep]
                # # 按照存在物體的置信度排序
                # _, conf_sort_index = torch.sort(detections_class[:, 4]*detections_class[:, 5], descending=True)
                # detections_class = detections_class[conf_sort_index]
                # # 進(jìn)行非極大抑制
                # max_detections = []
                # while detections_class.size(0):
                #     # 取出這一類置信度最高的，一步一步往下判斷，判斷重合程度是否大于nms_thres，如果是則去除掉
                #     max_detections.append(detections_class[0].unsqueeze(0))
                #     if len(detections_class) == 1:
                #         break
                #     ious = bbox_iou(max_detections[-1], detections_class[1:])
                #     detections_class = detections_class[1:][ious < nms_thres]
                # # 堆疊
                # max_detections = torch.cat(max_detections).data
                # Add max detections to outputs
                output[i] = max_detections if output[i] is None else torch.cat((output[i], max_detections))
            if output[i] is not None:
                output[i]           = output[i].cpu().numpy()
                box_xy, box_wh      = (output[i][:, 0:2] + output[i][:, 2:4])/2, output[i][:, 2:4] - output[i][:, 0:2]
                output[i][:, :4]    = self.yolo_correct_boxes(box_xy, box_wh, input_shape, image_shape, letterbox_image)
        return output

5、在原圖上進(jìn)行繪制

通過(guò)第四步，我們可以獲得預(yù)測(cè)框在原圖上的位置，而且這些預(yù)測(cè)框都是經(jīng)過(guò)篩選的。這些篩選后的框可以直接繪制在圖片上，就可以獲得結(jié)果了。

YOLOV4的訓(xùn)練

1、YOLOV4的改進(jìn)訓(xùn)練技巧

a)、Mosaic數(shù)據(jù)增強(qiáng)

Yolov4的mosaic數(shù)據(jù)增強(qiáng)參考了CutMix數(shù)據(jù)增強(qiáng)方式，理論上具有一定的相似性！

CutMix數(shù)據(jù)增強(qiáng)方式利用兩張圖片進(jìn)行拼接。

但是mosaic利用了四張圖片，根據(jù)論文所說(shuō)其擁有一個(gè)巨大的優(yōu)點(diǎn)是豐富檢測(cè)物體的背景！且在BN計(jì)算的時(shí)候一下子會(huì)計(jì)算四張圖片的數(shù)據(jù)！

就像下圖這樣：

實(shí)現(xiàn)思路如下：

1、每次讀取四張圖片。

2、分別對(duì)四張圖片進(jìn)行翻轉(zhuǎn)、縮放、色域變化等，并且按照四個(gè)方向位置擺好。

3、進(jìn)行圖片的組合和框的組合

def merge_bboxes(self, bboxes, cutx, cuty):
    merge_bbox = []
    for i in range(len(bboxes)):
        for box in bboxes[i]:
            tmp_box = []
            x1, y1, x2, y2 = box[0], box[1], box[2], box[3]
            if i == 0:
                if y1 > cuty or x1 > cutx:
                    continue
                if y2 >= cuty and y1 <= cuty:
                    y2 = cuty
                if x2 >= cutx and x1 <= cutx:
                    x2 = cutx
            if i == 1:
                if y2 < cuty or x1 > cutx:
                    continue
                if y2 >= cuty and y1 <= cuty:
                    y1 = cuty
                if x2 >= cutx and x1 <= cutx:
                    x2 = cutx
            if i == 2:
                if y2 < cuty or x2 < cutx:
                    continue
                if y2 >= cuty and y1 <= cuty:
                    y1 = cuty
                if x2 >= cutx and x1 <= cutx:
                    x1 = cutx
            if i == 3:
                if y1 > cuty or x2 < cutx:
                    continue
                if y2 >= cuty and y1 <= cuty:
                    y2 = cuty
                if x2 >= cutx and x1 <= cutx:
                    x1 = cutx
            tmp_box.append(x1)
            tmp_box.append(y1)
            tmp_box.append(x2)
            tmp_box.append(y2)
            tmp_box.append(box[-1])
            merge_bbox.append(tmp_box)
    return merge_bbox
def get_random_data_with_Mosaic(self, annotation_line, input_shape, max_boxes=100, hue=.1, sat=1.5, val=1.5):
    h, w = input_shape
    min_offset_x = self.rand(0.25, 0.75)
    min_offset_y = self.rand(0.25, 0.75)
    nws     = [ int(w * self.rand(0.4, 1)), int(w * self.rand(0.4, 1)), int(w * self.rand(0.4, 1)), int(w * self.rand(0.4, 1))]
    nhs     = [ int(h * self.rand(0.4, 1)), int(h * self.rand(0.4, 1)), int(h * self.rand(0.4, 1)), int(h * self.rand(0.4, 1))]
    place_x = [int(w*min_offset_x) - nws[0], int(w*min_offset_x) - nws[1], int(w*min_offset_x), int(w*min_offset_x)]
    place_y = [int(h*min_offset_y) - nhs[0], int(h*min_offset_y), int(h*min_offset_y), int(h*min_offset_y) - nhs[3]]
    image_datas = [] 
    box_datas   = []
    index       = 0
    for line in annotation_line:
        # 每一行進(jìn)行分割
        line_content = line.split()
        # 打開(kāi)圖片
        image = Image.open(line_content[0])
        image = cvtColor(image)
        # 圖片的大小
        iw, ih = image.size
        # 保存框的位置
        box = np.array([np.array(list(map(int,box.split(',')))) for box in line_content[1:]])
        # 是否翻轉(zhuǎn)圖片
        flip = self.rand()<.5
        if flip and len(box)>0:
            image = image.transpose(Image.FLIP_LEFT_RIGHT)
            box[:, [0,2]] = iw - box[:, [2,0]]
        nw = nws[index] 
        nh = nhs[index] 
        image = image.resize((nw,nh), Image.BICUBIC)
        # 將圖片進(jìn)行放置，分別對(duì)應(yīng)四張分割圖片的位置
        dx = place_x[index]
        dy = place_y[index]
        new_image = Image.new('RGB', (w,h), (128,128,128))
        new_image.paste(image, (dx, dy))
        image_data = np.array(new_image)
        index = index + 1
        box_data = []
        # 對(duì)box進(jìn)行重新處理
        if len(box)>0:
            np.random.shuffle(box)
            box[:, [0,2]] = box[:, [0,2]]*nw/iw + dx
            box[:, [1,3]] = box[:, [1,3]]*nh/ih + dy
            box[:, 0:2][box[:, 0:2]<0] = 0
            box[:, 2][box[:, 2]>w] = w
            box[:, 3][box[:, 3]>h] = h
            box_w = box[:, 2] - box[:, 0]
            box_h = box[:, 3] - box[:, 1]
            box = box[np.logical_and(box_w>1, box_h>1)]
            box_data = np.zeros((len(box),5))
            box_data[:len(box)] = box
        image_datas.append(image_data)
        box_datas.append(box_data)
    # 將圖片分割，放在一起
    cutx = int(w * min_offset_x)
    cuty = int(h * min_offset_y)
    new_image = np.zeros([h, w, 3])
    new_image[:cuty, :cutx, :] = image_datas[0][:cuty, :cutx, :]
    new_image[cuty:, :cutx, :] = image_datas[1][cuty:, :cutx, :]
    new_image[cuty:, cutx:, :] = image_datas[2][cuty:, cutx:, :]
    new_image[:cuty, cutx:, :] = image_datas[3][:cuty, cutx:, :]
    # 進(jìn)行色域變換
    hue = self.rand(-hue, hue)
    sat = self.rand(1, sat) if self.rand()<.5 else 1/self.rand(1, sat)
    val = self.rand(1, val) if self.rand()<.5 else 1/self.rand(1, val)
    x = cv2.cvtColor(np.array(new_image/255,np.float32), cv2.COLOR_RGB2HSV)
    x[..., 0] += hue*360
    x[..., 0][x[..., 0]>1] -= 1
    x[..., 0][x[..., 0]<0] += 1
    x[..., 1] *= sat
    x[..., 2] *= val
    x[x[:, :, 0]>360, 0] = 360
    x[:, :, 1:][x[:, :, 1:]>1] = 1
    x[x<0] = 0
    new_image = cv2.cvtColor(x, cv2.COLOR_HSV2RGB)*255
    # 對(duì)框進(jìn)行進(jìn)一步的處理
    new_boxes = self.merge_bboxes(box_datas, cutx, cuty)
    return new_image, new_boxes

b)、Label Smoothing平滑

標(biāo)簽平滑的思想很簡(jiǎn)單，具體公式如下：

new_onehot_labels = onehot_labels * (1 - label_smoothing) + label_smoothing / num_classes

當(dāng)label_smoothing的值為0.01得時(shí)候，公式變成如下所示：

new_onehot_labels = y * (1 - 0.01) + 0.01 / num_classes

其實(shí)Label Smoothing平滑就是將標(biāo)簽進(jìn)行一個(gè)平滑，原始的標(biāo)簽是0、1，在平滑后變成0.005(如果是二分類)、0.995，也就是說(shuō)對(duì)分類準(zhǔn)確做了一點(diǎn)懲罰，讓模型不可以分類的太準(zhǔn)確，太準(zhǔn)確容易過(guò)擬合。

實(shí)現(xiàn)代碼如下：

#---------------------------------------------------#
#   平滑標(biāo)簽
#---------------------------------------------------#
def smooth_labels(y_true, label_smoothing,num_classes):
    return y_true * (1.0 - label_smoothing) + label_smoothing / num_classes

c)、CIOU

IoU是比值的概念，對(duì)目標(biāo)物體的scale是不敏感的。然而常用的BBox的回歸損失優(yōu)化和IoU優(yōu)化不是完全等價(jià)的，尋常的IoU無(wú)法直接優(yōu)化沒(méi)有重疊的部分。

于是有人提出直接使用IOU作為回歸優(yōu)化loss，CIOU是其中非常優(yōu)秀的一種想法。

CIOU將目標(biāo)與anchor之間的距離，重疊率、尺度以及懲罰項(xiàng)都考慮進(jìn)去，使得目標(biāo)框回歸變得更加穩(wěn)定，不會(huì)像IoU和GIoU一樣出現(xiàn)訓(xùn)練過(guò)程中發(fā)散等問(wèn)題。而懲罰因子把預(yù)測(cè)框長(zhǎng)寬比擬合目標(biāo)框的長(zhǎng)寬比考慮進(jìn)去。

CIOU公式如下

def box_ciou(self, b1, b2):
    """
    輸入為：
    ----------
    b1: tensor, shape=(batch, feat_w, feat_h, anchor_num, 4), xywh
    b2: tensor, shape=(batch, feat_w, feat_h, anchor_num, 4), xywh
    返回為：
    -------
    ciou: tensor, shape=(batch, feat_w, feat_h, anchor_num, 1)
    """
    #----------------------------------------------------#
    #   求出預(yù)測(cè)框左上角右下角
    #----------------------------------------------------#
    b1_xy       = b1[..., :2]
    b1_wh       = b1[..., 2:4]
    b1_wh_half  = b1_wh/2.
    b1_mins     = b1_xy - b1_wh_half
    b1_maxes    = b1_xy + b1_wh_half
    #----------------------------------------------------#
    #   求出真實(shí)框左上角右下角
    #----------------------------------------------------#
    b2_xy       = b2[..., :2]
    b2_wh       = b2[..., 2:4]
    b2_wh_half  = b2_wh/2.
    b2_mins     = b2_xy - b2_wh_half
    b2_maxes    = b2_xy + b2_wh_half
    #----------------------------------------------------#
    #   求真實(shí)框和預(yù)測(cè)框所有的iou
    #----------------------------------------------------#
    intersect_mins  = torch.max(b1_mins, b2_mins)
    intersect_maxes = torch.min(b1_maxes, b2_maxes)
    intersect_wh    = torch.max(intersect_maxes - intersect_mins, torch.zeros_like(intersect_maxes))
    intersect_area  = intersect_wh[..., 0] * intersect_wh[..., 1]
    b1_area         = b1_wh[..., 0] * b1_wh[..., 1]
    b2_area         = b2_wh[..., 0] * b2_wh[..., 1]
    union_area      = b1_area + b2_area - intersect_area
    iou             = intersect_area / torch.clamp(union_area,min = 1e-6)
    #----------------------------------------------------#
    #   計(jì)算中心的差距
    #----------------------------------------------------#
    center_distance = torch.sum(torch.pow((b1_xy - b2_xy), 2), axis=-1)
    #----------------------------------------------------#
    #   找到包裹兩個(gè)框的最小框的左上角和右下角
    #----------------------------------------------------#
    enclose_mins    = torch.min(b1_mins, b2_mins)
    enclose_maxes   = torch.max(b1_maxes, b2_maxes)
    enclose_wh      = torch.max(enclose_maxes - enclose_mins, torch.zeros_like(intersect_maxes))
    #----------------------------------------------------#
    #   計(jì)算對(duì)角線距離
    #----------------------------------------------------#
    enclose_diagonal = torch.sum(torch.pow(enclose_wh,2), axis=-1)
    ciou            = iou - 1.0 * (center_distance) / torch.clamp(enclose_diagonal,min = 1e-6)
    v       = (4 / (math.pi ** 2)) * torch.pow((torch.atan(b1_wh[..., 0] / torch.clamp(b1_wh[..., 1],min = 1e-6)) - torch.atan(b2_wh[..., 0] / torch.clamp(b2_wh[..., 1], min = 1e-6))), 2)
    alpha   = v / torch.clamp((1.0 - iou + v), min=1e-6)
    ciou    = ciou - alpha * v
    return ciou

d)、學(xué)習(xí)率余弦退火衰減

余弦退火衰減法，學(xué)習(xí)率會(huì)先上升再下降，這是退火優(yōu)化法的思想。（關(guān)于什么是退火算法可以百度。）

上升的時(shí)候使用線性上升，下降的時(shí)候模擬cos函數(shù)下降。執(zhí)行多次。

效果如圖所示：

pytorch有直接實(shí)現(xiàn)的函數(shù)，可直接調(diào)用。

lr_scheduler = optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=5, eta_min=1e-5)

2、loss組成

a)、計(jì)算loss所需參數(shù)

在計(jì)算loss的時(shí)候，實(shí)際上是y_pre和y_true之間的對(duì)比：

y_pre就是一幅圖像經(jīng)過(guò)網(wǎng)絡(luò)之后的輸出，內(nèi)部含有三個(gè)特征層的內(nèi)容；其需要解碼才能夠在圖上作畫

y_true就是一個(gè)真實(shí)圖像中，它的每個(gè)真實(shí)框?qū)?yīng)的(19,19)、(38,38)、(76,76)網(wǎng)格上的偏移位置、長(zhǎng)寬與種類。其仍需要編碼才能與y_pred的結(jié)構(gòu)一致

實(shí)際上y_pre和y_true內(nèi)容的shape都是

(batch_size,19,19,3,85)(batch_size,38,38,3,85)(batch_size,76,76,3,85)

b)、y_pre是什么

網(wǎng)絡(luò)最后輸出的內(nèi)容就是三個(gè)特征層每個(gè)網(wǎng)格點(diǎn)對(duì)應(yīng)的預(yù)測(cè)框及其種類，即三個(gè)特征層分別對(duì)應(yīng)著圖片被分為不同size的網(wǎng)格后，每個(gè)網(wǎng)格點(diǎn)上三個(gè)先驗(yàn)框?qū)?yīng)的位置、置信度及其種類。

對(duì)于輸出的y1、y2、y3而言，[…, : 2]指的是相對(duì)于每個(gè)網(wǎng)格點(diǎn)的偏移量，[…, 2: 4]指的是寬和高，[…, 4: 5]指的是該框的置信度，[…, 5: ]指的是每個(gè)種類的預(yù)測(cè)概率。

現(xiàn)在的y_pre還是沒(méi)有解碼的，解碼了之后才是真實(shí)圖像上的情況。

c)、y_true是什么。

d)、loss的計(jì)算過(guò)程

在得到了y_pre和y_true后怎么對(duì)比呢？不是簡(jiǎn)單的減一下!

loss值需要對(duì)三個(gè)特征層進(jìn)行處理，這里以最小的特征層為例。

1、利用y_true取出該特征層中真實(shí)存在目標(biāo)的點(diǎn)的位置(m,19,19,3,1)及其對(duì)應(yīng)的種類(m,19,19,3,80)。2、將prediction的預(yù)測(cè)值輸出進(jìn)行處理，得到reshape后的預(yù)測(cè)值y_pre，shape為

(m,19,19,3,85)。還有解碼后的xy，wh。

3、對(duì)于每一幅圖，計(jì)算其中所有真實(shí)框與預(yù)測(cè)框的IOU，如果某些預(yù)測(cè)框和真實(shí)框的重合程度大于0.5，則忽略。

4、計(jì)算ciou作為回歸的loss，這里只計(jì)算正樣本的回歸loss。

5、計(jì)算置信度的loss，其有兩部分構(gòu)成，第一部分是實(shí)際上存在目標(biāo)的，預(yù)測(cè)結(jié)果中置信度的值與1對(duì)比；第二部分是實(shí)際上不存在目標(biāo)的，在第四步中得到其最大IOU的值與0對(duì)比。

6、計(jì)算預(yù)測(cè)種類的loss，其計(jì)算的是實(shí)際上存在目標(biāo)的，預(yù)測(cè)類與真實(shí)類的差距。

其實(shí)際上計(jì)算的總的loss是三個(gè)loss的和，這三個(gè)loss分別是：

實(shí)際存在的框，CIOU LOSS。
實(shí)際存在的框，預(yù)測(cè)結(jié)果中置信度的值與1對(duì)比；實(shí)際不存在的框，預(yù)測(cè)結(jié)果中置信度的值與0對(duì)比，該部分要去除被忽略的不包含目標(biāo)的框。
實(shí)際存在的框，種類預(yù)測(cè)結(jié)果與實(shí)際結(jié)果的對(duì)比。

其實(shí)際代碼如下：

import torch
import torch.nn as nn
import math
import numpy as np
class YOLOLoss(nn.Module):
    def __init__(self, anchors, num_classes, input_shape, cuda, anchors_mask = [[6,7,8], [3,4,5], [0,1,2]], label_smoothing = 0):
        super(YOLOLoss, self).__init__()
        #-----------------------------------------------------------#
        #   13x13的特征層對(duì)應(yīng)的anchor是[142, 110],[192, 243],[459, 401]
        #   26x26的特征層對(duì)應(yīng)的anchor是[36, 75],[76, 55],[72, 146]
        #   52x52的特征層對(duì)應(yīng)的anchor是[12, 16],[19, 36],[40, 28]
        #-----------------------------------------------------------#
        self.anchors        = anchors
        self.num_classes    = num_classes
        self.bbox_attrs     = 5 + num_classes
        self.input_shape    = input_shape
        self.anchors_mask   = anchors_mask
        self.label_smoothing = label_smoothing
        self.ignore_threshold = 0.7
        self.cuda = cuda
    def clip_by_tensor(self, t, t_min, t_max):
        t = t.float()
        result = (t >= t_min).float() * t + (t < t_min).float() * t_min
        result = (result <= t_max).float() * result + (result > t_max).float() * t_max
        return result
    def MSELoss(self, pred, target):
        return torch.pow(pred - target, 2)
    def BCELoss(self, pred, target):
        epsilon = 1e-7
        pred    = self.clip_by_tensor(pred, epsilon, 1.0 - epsilon)
        output  = - target * torch.log(pred) - (1.0 - target) * torch.log(1.0 - pred)
        return output
    def box_ciou(self, b1, b2):
        """
        輸入為：
        ----------
        b1: tensor, shape=(batch, feat_w, feat_h, anchor_num, 4), xywh
        b2: tensor, shape=(batch, feat_w, feat_h, anchor_num, 4), xywh
        返回為：
        -------
        ciou: tensor, shape=(batch, feat_w, feat_h, anchor_num, 1)
        """
        #----------------------------------------------------#
        #   求出預(yù)測(cè)框左上角右下角
        #----------------------------------------------------#
        b1_xy       = b1[..., :2]
        b1_wh       = b1[..., 2:4]
        b1_wh_half  = b1_wh/2.
        b1_mins     = b1_xy - b1_wh_half
        b1_maxes    = b1_xy + b1_wh_half
        #----------------------------------------------------#
        #   求出真實(shí)框左上角右下角
        #----------------------------------------------------#
        b2_xy       = b2[..., :2]
        b2_wh       = b2[..., 2:4]
        b2_wh_half  = b2_wh/2.
        b2_mins     = b2_xy - b2_wh_half
        b2_maxes    = b2_xy + b2_wh_half
        #----------------------------------------------------#
        #   求真實(shí)框和預(yù)測(cè)框所有的iou
        #----------------------------------------------------#
        intersect_mins  = torch.max(b1_mins, b2_mins)
        intersect_maxes = torch.min(b1_maxes, b2_maxes)
        intersect_wh    = torch.max(intersect_maxes - intersect_mins, torch.zeros_like(intersect_maxes))
        intersect_area  = intersect_wh[..., 0] * intersect_wh[..., 1]
        b1_area         = b1_wh[..., 0] * b1_wh[..., 1]
        b2_area         = b2_wh[..., 0] * b2_wh[..., 1]
        union_area      = b1_area + b2_area - intersect_area
        iou             = intersect_area / torch.clamp(union_area,min = 1e-6)
        #----------------------------------------------------#
        #   計(jì)算中心的差距
        #----------------------------------------------------#
        center_distance = torch.sum(torch.pow((b1_xy - b2_xy), 2), axis=-1)
        #----------------------------------------------------#
        #   找到包裹兩個(gè)框的最小框的左上角和右下角
        #----------------------------------------------------#
        enclose_mins    = torch.min(b1_mins, b2_mins)
        enclose_maxes   = torch.max(b1_maxes, b2_maxes)
        enclose_wh      = torch.max(enclose_maxes - enclose_mins, torch.zeros_like(intersect_maxes))
        #----------------------------------------------------#
        #   計(jì)算對(duì)角線距離
        #----------------------------------------------------#
        enclose_diagonal = torch.sum(torch.pow(enclose_wh,2), axis=-1)
        ciou            = iou - 1.0 * (center_distance) / torch.clamp(enclose_diagonal,min = 1e-6)
        v       = (4 / (math.pi ** 2)) * torch.pow((torch.atan(b1_wh[..., 0] / torch.clamp(b1_wh[..., 1],min = 1e-6)) - torch.atan(b2_wh[..., 0] / torch.clamp(b2_wh[..., 1], min = 1e-6))), 2)
        alpha   = v / torch.clamp((1.0 - iou + v), min=1e-6)
        ciou    = ciou - alpha * v
        return ciou
    #---------------------------------------------------#
    #   平滑標(biāo)簽
    #---------------------------------------------------#
    def smooth_labels(self, y_true, label_smoothing, num_classes):
        return y_true * (1.0 - label_smoothing) + label_smoothing / num_classes
    def forward(self, l, input, targets=None):
        #----------------------------------------------------#
        #   l 代表使用的是第幾個(gè)有效特征層
        #   input的shape為  bs, 3*(5+num_classes), 13, 13
        #                   bs, 3*(5+num_classes), 26, 26
        #                   bs, 3*(5+num_classes), 52, 52
        #   targets 真實(shí)框的標(biāo)簽情況 [batch_size, num_gt, 5]
        #----------------------------------------------------#
        #--------------------------------#
        #   獲得圖片數(shù)量，特征層的高和寬
        #--------------------------------#
        bs      = input.size(0)
        in_h    = input.size(2)
        in_w    = input.size(3)
        #-----------------------------------------------------------------------#
        #   計(jì)算步長(zhǎng)
        #   每一個(gè)特征點(diǎn)對(duì)應(yīng)原來(lái)的圖片上多少個(gè)像素點(diǎn)
        #   
        #   如果特征層為13x13的話，一個(gè)特征點(diǎn)就對(duì)應(yīng)原來(lái)的圖片上的32個(gè)像素點(diǎn)
        #   如果特征層為26x26的話，一個(gè)特征點(diǎn)就對(duì)應(yīng)原來(lái)的圖片上的16個(gè)像素點(diǎn)
        #   如果特征層為52x52的話，一個(gè)特征點(diǎn)就對(duì)應(yīng)原來(lái)的圖片上的8個(gè)像素點(diǎn)
        #   stride_h = stride_w = 32、16、8
        #-----------------------------------------------------------------------#
        stride_h = self.input_shape[0] / in_h
        stride_w = self.input_shape[1] / in_w
        #-------------------------------------------------#
        #   此時(shí)獲得的scaled_anchors大小是相對(duì)于特征層的
        #-------------------------------------------------#
        scaled_anchors  = [(a_w / stride_w, a_h / stride_h) for a_w, a_h in self.anchors]
        #-----------------------------------------------#
        #   輸入的input一共有三個(gè)，他們的shape分別是
        #   bs, 3 * (5+num_classes), 13, 13 => bs, 3, 5 + num_classes, 13, 13 => batch_size, 3, 13, 13, 5 + num_classes
        #   batch_size, 3, 13, 13, 5 + num_classes
        #   batch_size, 3, 26, 26, 5 + num_classes
        #   batch_size, 3, 52, 52, 5 + num_classes
        #-----------------------------------------------#
        prediction = input.view(bs, len(self.anchors_mask[l]), self.bbox_attrs, in_h, in_w).permute(0, 1, 3, 4, 2).contiguous()
        #-----------------------------------------------#
        #   先驗(yàn)框的中心位置的調(diào)整參數(shù)
        #-----------------------------------------------#
        x = torch.sigmoid(prediction[..., 0])
        y = torch.sigmoid(prediction[..., 1])
        #-----------------------------------------------#
        #   先驗(yàn)框的寬高調(diào)整參數(shù)
        #-----------------------------------------------#
        w = prediction[..., 2]
        h = prediction[..., 3]
        #-----------------------------------------------#
        #   獲得置信度，是否有物體
        #-----------------------------------------------#
        conf = torch.sigmoid(prediction[..., 4])
        #-----------------------------------------------#
        #   種類置信度
        #-----------------------------------------------#
        pred_cls = torch.sigmoid(prediction[..., 5:])
        #-----------------------------------------------#
        #   獲得網(wǎng)絡(luò)應(yīng)該有的預(yù)測(cè)結(jié)果
        #-----------------------------------------------#
        y_true, noobj_mask, box_loss_scale = self.get_target(l, targets, scaled_anchors, in_h, in_w)
        #---------------------------------------------------------------#
        #   將預(yù)測(cè)結(jié)果進(jìn)行解碼，判斷預(yù)測(cè)結(jié)果和真實(shí)值的重合程度
        #   如果重合程度過(guò)大則忽略，因?yàn)檫@些特征點(diǎn)屬于預(yù)測(cè)比較準(zhǔn)確的特征點(diǎn)
        #   作為負(fù)樣本不合適
        #----------------------------------------------------------------#
        noobj_mask, pred_boxes = self.get_ignore(l, x, y, h, w, targets, scaled_anchors, in_h, in_w, noobj_mask)
        if self.cuda:
            y_true          = y_true.cuda()
            noobj_mask      = noobj_mask.cuda()
            box_loss_scale  = box_loss_scale.cuda()
        #-----------------------------------------------------------#
        #   reshape_y_true[...,2:3]和reshape_y_true[...,3:4]
        #   表示真實(shí)框的寬高，二者均在0-1之間
        #   真實(shí)框越大，比重越小，小框的比重更大。
        #-----------------------------------------------------------#
        box_loss_scale = 2 - box_loss_scale
        #---------------------------------------------------------------#
        #   計(jì)算預(yù)測(cè)結(jié)果和真實(shí)結(jié)果的CIOU
        #----------------------------------------------------------------#
        ciou        = (1 - self.box_ciou(pred_boxes[y_true[..., 4] == 1], y_true[..., :4][y_true[..., 4] == 1])) * box_loss_scale[y_true[..., 4] == 1]
        loss_loc    = torch.sum(ciou)
        #-----------------------------------------------------------#
        #   計(jì)算置信度的loss
        #-----------------------------------------------------------#
        loss_conf   = torch.sum(self.BCELoss(conf, y_true[..., 4]) * y_true[..., 4]) + \
                      torch.sum(self.BCELoss(conf, y_true[..., 4]) * noobj_mask)
        loss_cls    = torch.sum(self.BCELoss(pred_cls[y_true[..., 4] == 1], self.smooth_labels(y_true[..., 5:][y_true[..., 4] == 1], self.label_smoothing, self.num_classes)))
        loss        = loss_loc + loss_conf + loss_cls
        num_pos = torch.sum(y_true[..., 4])
        num_pos = torch.max(num_pos, torch.ones_like(num_pos))
        return loss, num_pos
    def calculate_iou(self, _box_a, _box_b):
        #-----------------------------------------------------------#
        #   計(jì)算真實(shí)框的左上角和右下角
        #-----------------------------------------------------------#
        b1_x1, b1_x2 = _box_a[:, 0] - _box_a[:, 2] / 2, _box_a[:, 0] + _box_a[:, 2] / 2
        b1_y1, b1_y2 = _box_a[:, 1] - _box_a[:, 3] / 2, _box_a[:, 1] + _box_a[:, 3] / 2
        #-----------------------------------------------------------#
        #   計(jì)算先驗(yàn)框獲得的預(yù)測(cè)框的左上角和右下角
        #-----------------------------------------------------------#
        b2_x1, b2_x2 = _box_b[:, 0] - _box_b[:, 2] / 2, _box_b[:, 0] + _box_b[:, 2] / 2
        b2_y1, b2_y2 = _box_b[:, 1] - _box_b[:, 3] / 2, _box_b[:, 1] + _box_b[:, 3] / 2
        #-----------------------------------------------------------#
        #   將真實(shí)框和預(yù)測(cè)框都轉(zhuǎn)化成左上角右下角的形式
        #-----------------------------------------------------------#
        box_a = torch.zeros_like(_box_a)
        box_b = torch.zeros_like(_box_b)
        box_a[:, 0], box_a[:, 1], box_a[:, 2], box_a[:, 3] = b1_x1, b1_y1, b1_x2, b1_y2
        box_b[:, 0], box_b[:, 1], box_b[:, 2], box_b[:, 3] = b2_x1, b2_y1, b2_x2, b2_y2
        #-----------------------------------------------------------#
        #   A為真實(shí)框的數(shù)量，B為先驗(yàn)框的數(shù)量
        #-----------------------------------------------------------#
        A = box_a.size(0)
        B = box_b.size(0)
        #-----------------------------------------------------------#
        #   計(jì)算交的面積
        #-----------------------------------------------------------#
        max_xy  = torch.min(box_a[:, 2:].unsqueeze(1).expand(A, B, 2), box_b[:, 2:].unsqueeze(0).expand(A, B, 2))
        min_xy  = torch.max(box_a[:, :2].unsqueeze(1).expand(A, B, 2), box_b[:, :2].unsqueeze(0).expand(A, B, 2))
        inter   = torch.clamp((max_xy - min_xy), min=0)
        inter   = inter[:, :, 0] * inter[:, :, 1]
        #-----------------------------------------------------------#
        #   計(jì)算預(yù)測(cè)框和真實(shí)框各自的面積
        #-----------------------------------------------------------#
        area_a = ((box_a[:, 2]-box_a[:, 0]) * (box_a[:, 3]-box_a[:, 1])).unsqueeze(1).expand_as(inter)  # [A,B]
        area_b = ((box_b[:, 2]-box_b[:, 0]) * (box_b[:, 3]-box_b[:, 1])).unsqueeze(0).expand_as(inter)  # [A,B]
        #-----------------------------------------------------------#
        #   求IOU
        #-----------------------------------------------------------#
        union = area_a + area_b - inter
        return inter / union  # [A,B]
    def get_target(self, l, targets, anchors, in_h, in_w):
        #-----------------------------------------------------#
        #   計(jì)算一共有多少?gòu)垐D片
        #-----------------------------------------------------#
        bs              = len(targets)
        #-----------------------------------------------------#
        #   用于選取哪些先驗(yàn)框不包含物體
        #-----------------------------------------------------#
        noobj_mask      = torch.ones(bs, len(self.anchors_mask[l]), in_h, in_w, requires_grad = False)
        #-----------------------------------------------------#
        #   讓網(wǎng)絡(luò)更加去關(guān)注小目標(biāo)
        #-----------------------------------------------------#
        box_loss_scale  = torch.zeros(bs, len(self.anchors_mask[l]), in_h, in_w, requires_grad = False)
        #-----------------------------------------------------#
        #   batch_size, 3, 13, 13, 5 + num_classes
        #-----------------------------------------------------#
        y_true          = torch.zeros(bs, len(self.anchors_mask[l]), in_h, in_w, self.bbox_attrs, requires_grad = False)
        for b in range(bs):            
            if len(targets[b])==0:
                continue
            batch_target = torch.zeros_like(targets[b])
            #-------------------------------------------------------#
            #   計(jì)算出正樣本在特征層上的中心點(diǎn)
            #-------------------------------------------------------#
            batch_target[:, [0,2]] = targets[b][:, [0,2]] * in_w
            batch_target[:, [1,3]] = targets[b][:, [1,3]] * in_h
            batch_target[:, 4] = targets[b][:, 4]
            batch_target = batch_target.cpu()
            #-------------------------------------------------------#
            #   將真實(shí)框轉(zhuǎn)換一個(gè)形式
            #   num_true_box, 4
            #-------------------------------------------------------#
            gt_box          = torch.FloatTensor(torch.cat((torch.zeros((batch_target.size(0), 2)), batch_target[:, 2:4]), 1))
            #-------------------------------------------------------#
            #   將先驗(yàn)框轉(zhuǎn)換一個(gè)形式
            #   9, 4
            #-------------------------------------------------------#
            anchor_shapes   = torch.FloatTensor(torch.cat((torch.zeros((len(anchors), 2)), torch.FloatTensor(anchors)), 1))
            #-------------------------------------------------------#
            #   計(jì)算交并比
            #   self.calculate_iou(gt_box, anchor_shapes) = [num_true_box, 9]每一個(gè)真實(shí)框和9個(gè)先驗(yàn)框的重合情況
            #   best_ns:
            #   [每個(gè)真實(shí)框最大的重合度max_iou, 每一個(gè)真實(shí)框最重合的先驗(yàn)框的序號(hào)]
            #-------------------------------------------------------#
            best_ns = torch.argmax(self.calculate_iou(gt_box, anchor_shapes), dim=-1)
            for t, best_n in enumerate(best_ns):
                if best_n not in self.anchors_mask[l]:
                    continue
                #----------------------------------------#
                #   判斷這個(gè)先驗(yàn)框是當(dāng)前特征點(diǎn)的哪一個(gè)先驗(yàn)框
                #----------------------------------------#
                k = self.anchors_mask[l].index(best_n)
                #----------------------------------------#
                #   獲得真實(shí)框?qū)儆谀膫€(gè)網(wǎng)格點(diǎn)
                #----------------------------------------#
                i = torch.floor(batch_target[t, 0]).long()
                j = torch.floor(batch_target[t, 1]).long()
                #----------------------------------------#
                #   取出真實(shí)框的種類
                #----------------------------------------#
                c = batch_target[t, 4].long()
                #----------------------------------------#
                #   noobj_mask代表無(wú)目標(biāo)的特征點(diǎn)
                #----------------------------------------#
                noobj_mask[b, k, j, i] = 0
                #----------------------------------------#
                #   tx、ty代表中心調(diào)整參數(shù)的真實(shí)值
                #----------------------------------------#
                y_true[b, k, j, i, 0] = batch_target[t, 0]
                y_true[b, k, j, i, 1] = batch_target[t, 1]
                y_true[b, k, j, i, 2] = batch_target[t, 2]
                y_true[b, k, j, i, 3] = batch_target[t, 3]
                y_true[b, k, j, i, 4] = 1
                y_true[b, k, j, i, c + 5] = 1
                #----------------------------------------#
                #   用于獲得xywh的比例
                #   大目標(biāo)loss權(quán)重小，小目標(biāo)loss權(quán)重大
                #----------------------------------------#
                box_loss_scale[b, k, j, i] = batch_target[t, 2] * batch_target[t, 3] / in_w / in_h
        return y_true, noobj_mask, box_loss_scale
    def get_ignore(self, l, x, y, h, w, targets, scaled_anchors, in_h, in_w, noobj_mask):
        #-----------------------------------------------------#
        #   計(jì)算一共有多少?gòu)垐D片
        #-----------------------------------------------------#
        bs = len(targets)
        FloatTensor = torch.cuda.FloatTensor if x.is_cuda else torch.FloatTensor
        LongTensor  = torch.cuda.LongTensor if x.is_cuda else torch.LongTensor
        #-----------------------------------------------------#
        #   生成網(wǎng)格，先驗(yàn)框中心，網(wǎng)格左上角
        #-----------------------------------------------------#
        grid_x = torch.linspace(0, in_w - 1, in_w).repeat(in_h, 1).repeat(
            int(bs * len(self.anchors_mask[l])), 1, 1).view(x.shape).type(FloatTensor)
        grid_y = torch.linspace(0, in_h - 1, in_h).repeat(in_w, 1).t().repeat(
            int(bs * len(self.anchors_mask[l])), 1, 1).view(y.shape).type(FloatTensor)
        # 生成先驗(yàn)框的寬高
        scaled_anchors_l = np.array(scaled_anchors)[self.anchors_mask[l]]
        anchor_w = FloatTensor(scaled_anchors_l).index_select(1, LongTensor([0]))
        anchor_h = FloatTensor(scaled_anchors_l).index_select(1, LongTensor([1]))
        anchor_w = anchor_w.repeat(bs, 1).repeat(1, 1, in_h * in_w).view(w.shape)
        anchor_h = anchor_h.repeat(bs, 1).repeat(1, 1, in_h * in_w).view(h.shape)
        #-------------------------------------------------------#
        #   計(jì)算調(diào)整后的先驗(yàn)框中心與寬高
        #-------------------------------------------------------#
        pred_boxes_x    = torch.unsqueeze(x + grid_x, -1)
        pred_boxes_y    = torch.unsqueeze(y + grid_y, -1)
        pred_boxes_w    = torch.unsqueeze(torch.exp(w) * anchor_w, -1)
        pred_boxes_h    = torch.unsqueeze(torch.exp(h) * anchor_h, -1)
        pred_boxes      = torch.cat([pred_boxes_x, pred_boxes_y, pred_boxes_w, pred_boxes_h], dim = -1)
        for b in range(bs):           
            #-------------------------------------------------------#
            #   將預(yù)測(cè)結(jié)果轉(zhuǎn)換一個(gè)形式
            #   pred_boxes_for_ignore      num_anchors, 4
            #-------------------------------------------------------#
            pred_boxes_for_ignore = pred_boxes[b].view(-1, 4)
            #-------------------------------------------------------#
            #   計(jì)算真實(shí)框，并把真實(shí)框轉(zhuǎn)換成相對(duì)于特征層的大小
            #   gt_box      num_true_box, 4
            #-------------------------------------------------------#
            if len(targets[b]) > 0:
                batch_target = torch.zeros_like(targets[b])
                #-------------------------------------------------------#
                #   計(jì)算出正樣本在特征層上的中心點(diǎn)
                #-------------------------------------------------------#
                batch_target[:, [0,2]] = targets[b][:, [0,2]] * in_w
                batch_target[:, [1,3]] = targets[b][:, [1,3]] * in_h
                batch_target = batch_target[:, :4]
                #-------------------------------------------------------#
                #   計(jì)算交并比
                #   anch_ious       num_true_box, num_anchors
                #-------------------------------------------------------#
                anch_ious = self.calculate_iou(batch_target, pred_boxes_for_ignore)
                #-------------------------------------------------------#
                #   每個(gè)先驗(yàn)框?qū)?yīng)真實(shí)框的最大重合度
                #   anch_ious_max   num_anchors
                #-------------------------------------------------------#
                anch_ious_max, _    = torch.max(anch_ious, dim = 0)
                anch_ious_max       = anch_ious_max.view(pred_boxes[b].size()[:3])
                noobj_mask[b][anch_ious_max > self.ignore_threshold] = 0
        return noobj_mask, pred_boxes

訓(xùn)練自己的YoloV4模型

首先前往Github下載對(duì)應(yīng)的倉(cāng)庫(kù)，下載完后利用解壓軟件解壓，之后用編程軟件打開(kāi)文件夾。

注意打開(kāi)的根目錄必須正確，否則相對(duì)目錄不正確的情況下，代碼將無(wú)法運(yùn)行。

一定要注意打開(kāi)后的根目錄是文件存放的目錄。

一、數(shù)據(jù)集的準(zhǔn)備

本文使用VOC格式進(jìn)行訓(xùn)練，訓(xùn)練前需要自己制作好數(shù)據(jù)集，如果沒(méi)有自己的數(shù)據(jù)集，可以通過(guò)Github連接下載VOC12+07的數(shù)據(jù)集嘗試下。訓(xùn)練前將標(biāo)簽文件放在VOCdevkit文件夾下的VOC2007文件夾下的Annotation中。

訓(xùn)練前將圖片文件放在VOCdevkit文件夾下的VOC2007文件夾下的JPEGImages中。

此時(shí)數(shù)據(jù)集的擺放已經(jīng)結(jié)束。

二、數(shù)據(jù)集的處理

在完成數(shù)據(jù)集的擺放之后，我們需要對(duì)數(shù)據(jù)集進(jìn)行下一步的處理，目的是獲得訓(xùn)練用的2007_train.txt以及2007_val.txt，需要用到根目錄下的voc_annotation.py。

voc_annotation.py里面有一些參數(shù)需要設(shè)置。分別是annotation_mode、classes_path、trainval_percent、train_percent、VOCdevkit_path，第一次訓(xùn)練可以僅修改classes_path

'''
annotation_mode用于指定該文件運(yùn)行時(shí)計(jì)算的內(nèi)容
annotation_mode為0代表整個(gè)標(biāo)簽處理過(guò)程，包括獲得VOCdevkit/VOC2007/ImageSets里面的txt以及訓(xùn)練用的2007_train.txt、2007_val.txt
annotation_mode為1代表獲得VOCdevkit/VOC2007/ImageSets里面的txt
annotation_mode為2代表獲得訓(xùn)練用的2007_train.txt、2007_val.txt
'''
annotation_mode     = 0
'''
必須要修改，用于生成2007_train.txt、2007_val.txt的目標(biāo)信息
與訓(xùn)練和預(yù)測(cè)所用的classes_path一致即可
如果生成的2007_train.txt里面沒(méi)有目標(biāo)信息
那么就是因?yàn)閏lasses沒(méi)有設(shè)定正確
僅在annotation_mode為0和2的時(shí)候有效
'''
classes_path        = 'model_data/voc_classes.txt'
'''
trainval_percent用于指定(訓(xùn)練集+驗(yàn)證集)與測(cè)試集的比例，默認(rèn)情況下 (訓(xùn)練集+驗(yàn)證集):測(cè)試集 = 9:1
train_percent用于指定(訓(xùn)練集+驗(yàn)證集)中訓(xùn)練集與驗(yàn)證集的比例，默認(rèn)情況下 訓(xùn)練集:驗(yàn)證集 = 9:1
僅在annotation_mode為0和1的時(shí)候有效
'''
trainval_percent    = 0.9
train_percent       = 0.9
'''
指向VOC數(shù)據(jù)集所在的文件夾
默認(rèn)指向根目錄下的VOC數(shù)據(jù)集
'''
VOCdevkit_path  = 'VOCdevkit'

classes_path用于指向檢測(cè)類別所對(duì)應(yīng)的txt，以voc數(shù)據(jù)集為例，我們用的txt為：

訓(xùn)練自己的數(shù)據(jù)集時(shí)，可以自己建立一個(gè)cls_classes.txt，里面寫自己所需要區(qū)分的類別。

三、開(kāi)始網(wǎng)絡(luò)訓(xùn)練

通過(guò)voc_annotation.py我們已經(jīng)生成了2007_train.txt以及2007_val.txt，此時(shí)我們可以開(kāi)始訓(xùn)練了。

訓(xùn)練的參數(shù)較多，大家可以在下載庫(kù)后仔細(xì)看注釋，其中最重要的部分依然是train.py里的classes_path。

classes_path用于指向檢測(cè)類別所對(duì)應(yīng)的txt，這個(gè)txt和voc_annotation.py里面的txt一樣！訓(xùn)練自己的數(shù)據(jù)集必須要修改！

修改完classes_path后就可以運(yùn)行train.py開(kāi)始訓(xùn)練了，在訓(xùn)練多個(gè)epoch后，權(quán)值會(huì)生成在logs文件夾中。

其它參數(shù)的作用如下：

#--------------------------------------------------------#
#   訓(xùn)練前一定要修改classes_path，使其對(duì)應(yīng)自己的數(shù)據(jù)集
#--------------------------------------------------------#
classes_path    = 'model_data/voc_classes.txt'
#---------------------------------------------------------------------#
#   anchors_path代表先驗(yàn)框?qū)?yīng)的txt文件，一般不修改。
#   anchors_mask用于幫助代碼找到對(duì)應(yīng)的先驗(yàn)框，一般不修改。
#---------------------------------------------------------------------#
anchors_path    = 'model_data/yolo_anchors.txt'
anchors_mask    = [[6, 7, 8], [3, 4, 5], [0, 1, 2]]
#-------------------------------------------------------------------------------------#
#   權(quán)值文件請(qǐng)看README，百度網(wǎng)盤下載
#   訓(xùn)練自己的數(shù)據(jù)集時(shí)提示維度不匹配正常，預(yù)測(cè)的東西都不一樣了自然維度不匹配
#   預(yù)訓(xùn)練權(quán)重對(duì)于99%的情況都必須要用，不用的話權(quán)值太過(guò)隨機(jī)，特征提取效果不明顯
#   網(wǎng)絡(luò)訓(xùn)練的結(jié)果也不會(huì)好，數(shù)據(jù)的預(yù)訓(xùn)練權(quán)重對(duì)不同數(shù)據(jù)集是通用的，因?yàn)樘卣魇峭ㄓ玫?
#------------------------------------------------------------------------------------#
model_path      = 'model_data/yolo4_weight.h5'
#------------------------------------------------------#
#   輸入的shape大小，一定要是32的倍數(shù)
#------------------------------------------------------#
input_shape     = [416, 416]
#------------------------------------------------------#
#   Yolov4的tricks應(yīng)用
#   mosaic 馬賽克數(shù)據(jù)增強(qiáng) True or False 
#   實(shí)際測(cè)試時(shí)mosaic數(shù)據(jù)增強(qiáng)并不穩(wěn)定，所以默認(rèn)為False
#   Cosine_scheduler 余弦退火學(xué)習(xí)率 True or False
#   label_smoothing 標(biāo)簽平滑 0.01以下一般 如0.01、0.005
#------------------------------------------------------#
mosaic              = False
Cosine_scheduler    = False
label_smoothing     = 0
#----------------------------------------------------#
#   訓(xùn)練分為兩個(gè)階段，分別是凍結(jié)階段和解凍階段
#   凍結(jié)階段訓(xùn)練參數(shù)
#   此時(shí)模型的主干被凍結(jié)了，特征提取網(wǎng)絡(luò)不發(fā)生改變
#   占用的顯存較小，僅對(duì)網(wǎng)絡(luò)進(jìn)行微調(diào)
#----------------------------------------------------#
Init_Epoch          = 0
Freeze_Epoch        = 50
Freeze_batch_size   = 4
Freeze_lr           = 1e-3
#----------------------------------------------------#
#   解凍階段訓(xùn)練參數(shù)
#   此時(shí)模型的主干不被凍結(jié)了，特征提取網(wǎng)絡(luò)會(huì)發(fā)生改變
#   占用的顯存較大，網(wǎng)絡(luò)所有的參數(shù)都會(huì)發(fā)生改變
#   batch不能為1
#----------------------------------------------------#
UnFreeze_Epoch      = 100
Unfreeze_batch_size = 4
Unfreeze_lr         = 1e-4
#------------------------------------------------------#
#   是否進(jìn)行凍結(jié)訓(xùn)練，默認(rèn)先凍結(jié)主干訓(xùn)練后解凍訓(xùn)練。
#------------------------------------------------------#
Freeze_Train        = True
#------------------------------------------------------#
#   用于設(shè)置是否使用多線程讀取數(shù)據(jù)，0代表關(guān)閉多線程
#   開(kāi)啟后會(huì)加快數(shù)據(jù)讀取速度，但是會(huì)占用更多內(nèi)存
#   keras里開(kāi)啟多線程有些時(shí)候速度反而慢了許多
#   在IO為瓶頸的時(shí)候再開(kāi)啟多線程，即GPU運(yùn)算速度遠(yuǎn)大于讀取圖片的速度。
#------------------------------------------------------#
num_workers         = 0
#----------------------------------------------------#
#   獲得圖片路徑和標(biāo)簽
#----------------------------------------------------#
train_annotation_path   = '2007_train.txt'
val_annotation_path     = '2007_val.txt'

四、訓(xùn)練結(jié)果預(yù)測(cè)

訓(xùn)練結(jié)果預(yù)測(cè)需要用到兩個(gè)文件，分別是yolo.py和predict.py。

我們首先需要去yolo.py里面修改model_path以及classes_path，這兩個(gè)參數(shù)必須要修改。

model_path指向訓(xùn)練好的權(quán)值文件，在logs文件夾里。

classes_path指向檢測(cè)類別所對(duì)應(yīng)的txt。

完成修改后就可以運(yùn)行predict.py進(jìn)行檢測(cè)了。運(yùn)行后輸入圖片路徑即可檢測(cè)。

以上就是Pytorch搭建YoloV4目標(biāo)檢測(cè)平臺(tái)實(shí)現(xiàn)源碼的詳細(xì)內(nèi)容，更多關(guān)于Pytorch YoloV4目標(biāo)檢測(cè)的資料請(qǐng)關(guān)注腳本之家其它相關(guān)文章！

您可能感興趣的文章:

欧美bbbwbbbw肥妇,免费乱码人妻系列日韩,一级黄片

軟件下載

源碼下載

軟件編程

網(wǎng)絡(luò)編程

在線工具

數(shù)據(jù)庫(kù)

CMS

常用工具

Pytorch搭建YoloV4目標(biāo)檢測(cè)平臺(tái)實(shí)現(xiàn)源碼

目錄

什么是YOLOV4

YOLOV4結(jié)構(gòu)解析

1、主干特征提取網(wǎng)絡(luò)Backbone

2、特征金字塔

3、YoloHead利用獲得到的特征進(jìn)行預(yù)測(cè)

4、預(yù)測(cè)結(jié)果的解碼

5、在原圖上進(jìn)行繪制

YOLOV4的訓(xùn)練

1、YOLOV4的改進(jìn)訓(xùn)練技巧

a)、Mosaic數(shù)據(jù)增強(qiáng)

b)、Label Smoothing平滑

c)、CIOU

d)、學(xué)習(xí)率余弦退火衰減

2、loss組成

a)、計(jì)算loss所需參數(shù)

b)、y_pre是什么

c)、y_true是什么。

d)、loss的計(jì)算過(guò)程

訓(xùn)練自己的YoloV4模型

一、數(shù)據(jù)集的準(zhǔn)備

二、數(shù)據(jù)集的處理

三、開(kāi)始網(wǎng)絡(luò)訓(xùn)練

四、訓(xùn)練結(jié)果預(yù)測(cè)

相關(guān)文章

最新評(píng)論

大家感興趣的內(nèi)容

最近更新的內(nèi)容

常用在線小工具

Pytorch搭建YoloV4目標(biāo)檢測(cè)平臺(tái)實(shí)現(xiàn)源碼

目錄

什么是YOLOV4

YOLOV4結(jié)構(gòu)解析

1、主干特征提取網(wǎng)絡(luò)Backbone

2、特征金字塔

3、YoloHead利用獲得到的特征進(jìn)行預(yù)測(cè)

4、預(yù)測(cè)結(jié)果的解碼

5、在原圖上進(jìn)行繪制

YOLOV4的訓(xùn)練

1、YOLOV4的改進(jìn)訓(xùn)練技巧

a)、Mosaic數(shù)據(jù)增強(qiáng)

b)、Label Smoothing平滑

c)、CIOU

d)、學(xué)習(xí)率余弦退火衰減

2、loss組成

a)、計(jì)算loss所需參數(shù)

b)、y_pre是什么

c)、y_true是什么。

d)、loss的計(jì)算過(guò)程

訓(xùn)練自己的YoloV4模型

一、數(shù)據(jù)集的準(zhǔn)備

二、數(shù)據(jù)集的處理

三、開(kāi)始網(wǎng)絡(luò)訓(xùn)練

四、訓(xùn)練結(jié)果預(yù)測(cè)

相關(guān)文章

最新評(píng)論

大家感興趣的內(nèi)容

最近更新的內(nèi)容

常用在線小工具

1、主干特征提取網(wǎng)絡(luò)Backbone

2、特征金字塔

4、預(yù)測(cè)結(jié)果的解碼

1、YOLOV4的改進(jìn)訓(xùn)練技巧

a)、Mosaic數(shù)據(jù)增強(qiáng)

c)、CIOU

d)、學(xué)習(xí)率余弦退火衰減

a)、計(jì)算loss所需參數(shù)

b)、y_pre是什么

c)、y_true是什么。

d)、loss的計(jì)算過(guò)程

一、數(shù)據(jù)集的準(zhǔn)備

二、數(shù)據(jù)集的處理

三、開(kāi)始網(wǎng)絡(luò)訓(xùn)練

四、訓(xùn)練結(jié)果預(yù)測(cè)