python目標(biāo)檢測SSD算法訓(xùn)練部分源碼詳解

更新時間：2022年05月06日 10:28:43 作者：Bubbliiiing

這篇文章主要為大家介紹了python目標(biāo)檢測SSD算法訓(xùn)練部分源碼詳解，有需要的朋友可以借鑒參考下，希望能夠有所幫助，祝大家多多進(jìn)步，早日升職加薪

學(xué)習(xí)前言

……又看了很久的SSD算法，今天講解一下訓(xùn)練部分的代碼。

預(yù)測部分的代碼可以參照http://www.dbjr.com.cn/article/246905.htm

講解構(gòu)架

本次教程的講解主要是對訓(xùn)練部分的代碼進(jìn)行講解，該部分講解主要是對訓(xùn)練函數(shù)的執(zhí)行過程與執(zhí)行思路進(jìn)行詳解。

訓(xùn)練函數(shù)的執(zhí)行過程大體上分為：

1、設(shè)定訓(xùn)練參數(shù)。

2、讀取數(shù)據(jù)集。

3、建立ssd網(wǎng)絡(luò)。

4、預(yù)處理數(shù)據(jù)集。

5、對ground truth實(shí)際框進(jìn)行編碼，使其格式符合神經(jīng)網(wǎng)絡(luò)的預(yù)測結(jié)果，便于比較。

6、計算loss值。

7、利用優(yōu)化器完成梯度下降并保存模型。

在看本次算法前，建議先下載我簡化過的源碼，配合觀看，具體運(yùn)行方法在開始訓(xùn)練部分

下載鏈接 https://pan.baidu.com/s/1K4RAJvLj11blywuX2CrLSA

提取碼：4wbi

模型訓(xùn)練的流程

本文使用的ssd_vgg_300的源碼點(diǎn)擊下載，本文對其進(jìn)行了簡化，保留了上一次篩選出的預(yù)測部分，還加入了訓(xùn)練部分，便于理順整個SSD的框架。

1、設(shè)置參數(shù)

在載入數(shù)據(jù)庫前，首先要設(shè)定一系列的參數(shù)，這些參數(shù)可以分為幾個部分。第一部分是SSD網(wǎng)絡(luò)中的一些標(biāo)志參數(shù)：

# =========================================================================== #
# SSD Network flags.
# =========================================================================== #
# localization框的衰減比率
tf.app.flags.DEFINE_float(
    'loss_alpha', 1., 'Alpha parameter in the loss function.')
# 正負(fù)樣本比率
tf.app.flags.DEFINE_float(
    'negative_ratio', 3., 'Negative ratio in the loss function.')
# ground truth處理后，匹配得分高于match_threshold屬于正樣本
tf.app.flags.DEFINE_float(
    'match_threshold', 0.5, 'Matching threshold in the loss function.')

第二部分是訓(xùn)練時的參數(shù)（包括訓(xùn)練效果輸出、保存方案等）：

# =========================================================================== #
# General Flags.
# =========================================================================== #
# train_dir用于保存訓(xùn)練后的模型和日志
tf.app.flags.DEFINE_string(
    'train_dir', '/tmp/tfmodel/',
    'Directory where checkpoints and event logs are written to.')
# num_readers是在對數(shù)據(jù)集進(jìn)行讀取時所用的平行讀取器個數(shù)
tf.app.flags.DEFINE_integer(
    'num_readers', 4,
    'The number of parallel readers that read data from the dataset.')
# 在進(jìn)行訓(xùn)練batch的構(gòu)建時，所用的線程數(shù)
tf.app.flags.DEFINE_integer(
    'num_preprocessing_threads', 4,
    'The number of threads used to create the batches.')
# 每十步進(jìn)行一次log輸出，在窗口上
tf.app.flags.DEFINE_integer(
    'log_every_n_steps', 10,
    'The frequency with which logs are print.')
# 每600秒存儲一次記錄
tf.app.flags.DEFINE_integer(
    'save_summaries_secs', 600,
    'The frequency with which summaries are saved, in seconds.')
# 每600秒存儲一次模型
tf.app.flags.DEFINE_integer(
    'save_interval_secs', 600,
    'The frequency with which the model is saved, in seconds.')
# 可以使用的gpu內(nèi)存數(shù)量
tf.app.flags.DEFINE_float(
    'gpu_memory_fraction', 0.7, 'GPU memory fraction to use.')

第三部分是優(yōu)化器參數(shù)：

# =========================================================================== #
# Optimization Flags.
# =========================================================================== #
# 優(yōu)化器參數(shù)
# weight_decay參數(shù)
tf.app.flags.DEFINE_float(
    'weight_decay', 0.00004, 'The weight decay on the model weights.')
# 使用什么優(yōu)化器
tf.app.flags.DEFINE_string(
    'optimizer', 'rmsprop',
    'The name of the optimizer, one of "adadelta", "adagrad", "adam",'
    '"ftrl", "momentum", "sgd" or "rmsprop".')
tf.app.flags.DEFINE_float(
    'adadelta_rho', 0.95,
    'The decay rate for adadelta.')
tf.app.flags.DEFINE_float(
    'adagrad_initial_accumulator_value', 0.1,
    'Starting value for the AdaGrad accumulators.')
tf.app.flags.DEFINE_float(
    'adam_beta1', 0.9,
    'The exponential decay rate for the 1st moment estimates.')
tf.app.flags.DEFINE_float(
    'adam_beta2', 0.999,
    'The exponential decay rate for the 2nd moment estimates.')
tf.app.flags.DEFINE_float('opt_epsilon', 1.0, 'Epsilon term for the optimizer.')
tf.app.flags.DEFINE_float('ftrl_learning_rate_power', -0.5,
                          'The learning rate power.')
tf.app.flags.DEFINE_float(
    'ftrl_initial_accumulator_value', 0.1,
    'Starting value for the FTRL accumulators.')
tf.app.flags.DEFINE_float(
    'ftrl_l1', 0.0, 'The FTRL l1 regularization strength.')
tf.app.flags.DEFINE_float(
    'ftrl_l2', 0.0, 'The FTRL l2 regularization strength.')
tf.app.flags.DEFINE_float(
    'momentum', 0.9,
    'The momentum for the MomentumOptimizer and RMSPropOptimizer.')
tf.app.flags.DEFINE_float('rmsprop_momentum', 0.9, 'Momentum.')
tf.app.flags.DEFINE_float('rmsprop_decay', 0.9, 'Decay term for RMSProp.')

第四部分是學(xué)習(xí)率參數(shù)：

# =========================================================================== #
# Learning Rate Flags.
# =========================================================================== #
# 學(xué)習(xí)率衰減的方式，有固定、指數(shù)衰減等
tf.app.flags.DEFINE_string(
    'learning_rate_decay_type',
    'exponential',
    'Specifies how the learning rate is decayed. One of "fixed", "exponential",'
    ' or "polynomial"')
# 初始學(xué)習(xí)率
tf.app.flags.DEFINE_float('learning_rate', 0.01, 'Initial learning rate.')
# 結(jié)束時的學(xué)習(xí)率
tf.app.flags.DEFINE_float(
    'end_learning_rate', 0.0001,
    'The minimal end learning rate used by a polynomial decay learning rate.')
tf.app.flags.DEFINE_float(
    'label_smoothing', 0.0, 'The amount of label smoothing.')
# 學(xué)習(xí)率衰減因素
tf.app.flags.DEFINE_float(
    'learning_rate_decay_factor', 0.94, 'Learning rate decay factor.')
tf.app.flags.DEFINE_float(
    'num_epochs_per_decay', 2.0,
    'Number of epochs after which learning rate decays.')
tf.app.flags.DEFINE_float(
    'moving_average_decay', None,
    'The decay to use for the moving average.'
    'If left as None, then moving averages are not used.')

第五部分是數(shù)據(jù)集參數(shù)：

# =========================================================================== #
# Dataset Flags.
# =========================================================================== #
# 數(shù)據(jù)集名稱
tf.app.flags.DEFINE_string(
    'dataset_name', 'imagenet', 'The name of the dataset to load.')
# 數(shù)據(jù)集種類個數(shù)
tf.app.flags.DEFINE_integer(
    'num_classes', 21, 'Number of classes to use in the dataset.')
# 訓(xùn)練還是測試
tf.app.flags.DEFINE_string(
    'dataset_split_name', 'train', 'The name of the train/test split.')
# 數(shù)據(jù)集目錄
tf.app.flags.DEFINE_string(
    'dataset_dir', None, 'The directory where the dataset files are stored.')
tf.app.flags.DEFINE_integer(
    'labels_offset', 0,
    'An offset for the labels in the dataset. This flag is primarily used to '
    'evaluate the VGG and ResNet architectures which do not use a background '
    'class for the ImageNet dataset.')
tf.app.flags.DEFINE_string(
    'model_name', 'ssd_300_vgg', 'The name of the architecture to train.')
tf.app.flags.DEFINE_string(
    'preprocessing_name', None, 'The name of the preprocessing to use. If left '
    'as `None`, then the model_name flag is used.')
# 每一次訓(xùn)練batch的大小
tf.app.flags.DEFINE_integer(
    'batch_size', 32, 'The number of samples in each batch.')
# 訓(xùn)練圖片的大小
tf.app.flags.DEFINE_integer(
    'train_image_size', None, 'Train image size')
# 最大訓(xùn)練次數(shù)
tf.app.flags.DEFINE_integer('max_number_of_steps', 50000,
                            'The maximum number of training steps.')

第六部分是微修已有的模型所需的參數(shù)：

# =========================================================================== #
# Fine-Tuning Flags.
# =========================================================================== #
# 該部分參數(shù)用于微修已有的模型
# 原模型的位置
tf.app.flags.DEFINE_string(
    'checkpoint_path', None,
    'The path to a checkpoint from which to fine-tune.')
tf.app.flags.DEFINE_string(
    'checkpoint_model_scope', None,
    'Model scope in the checkpoint. None if the same as the trained model.')
# 哪些變量不要
tf.app.flags.DEFINE_string(
    'checkpoint_exclude_scopes', None,
    'Comma-separated list of scopes of variables to exclude when restoring '
    'from a checkpoint.')
# 那些變量不訓(xùn)練
tf.app.flags.DEFINE_string(
    'trainable_scopes', None,
    'Comma-separated list of scopes to filter the set of variables to train.'
    'By default, None would train all the variables.')
# 忽略丟失的變量
tf.app.flags.DEFINE_boolean(
    'ignore_missing_vars', False,
    'When restoring a checkpoint would ignore missing variables.')
FLAGS = tf.app.flags.FLAGS

所有的參數(shù)的意義我都進(jìn)行了標(biāo)注，在實(shí)際訓(xùn)練的時候需要修改一些參數(shù)的內(nèi)容，這些參數(shù)看起來多，其實(shí)只是包含了一個網(wǎng)絡(luò)訓(xùn)練所有必須的部分：

網(wǎng)絡(luò)主體參數(shù)；
訓(xùn)練時的普通參數(shù)（包括訓(xùn)練效果輸出、保存方案等）；
優(yōu)化器參數(shù)；
學(xué)習(xí)率參數(shù)；
數(shù)據(jù)集參數(shù)；
微修已有的模型的參數(shù)設(shè)置。

2、讀取數(shù)據(jù)集

在訓(xùn)練流程中，其通過如下函數(shù)讀取數(shù)據(jù)集

##########################讀取數(shù)據(jù)集部分#############################
# 選擇數(shù)據(jù)庫
dataset = dataset_factory.get_dataset(
    FLAGS.dataset_name, FLAGS.dataset_split_name, FLAGS.dataset_dir)

dataset_factory里面放的是數(shù)據(jù)集獲取和處理的函數(shù)，這里面對應(yīng)了4個數(shù)據(jù)集，利用datasets_map存儲了四個數(shù)據(jù)集的處理代碼。

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from datasets import cifar10
from datasets import imagenet
from datasets import pascalvoc_2007
from datasets import pascalvoc_2012
datasets_map = {
    'cifar10': cifar10,
    'imagenet': imagenet,
    'pascalvoc_2007': pascalvoc_2007,
    'pascalvoc_2012': pascalvoc_2012,
}
def get_dataset(name, split_name, dataset_dir, file_pattern=None, reader=None):
    """
    給定一個數(shù)據(jù)集名和一個拆分名返回一個數(shù)據(jù)集。
    參數(shù):
        name: String, 數(shù)據(jù)集名稱
        split_name: 訓(xùn)練還是測試
        dataset_dir: 存儲數(shù)據(jù)集文件的目錄。
        file_pattern: 用于匹配數(shù)據(jù)集源文件的文件模式。
        reader: tf.readerbase的子類。如果保留為“none”，則使用每個數(shù)據(jù)集定義的默認(rèn)讀取器。
    Returns:
        數(shù)據(jù)集
    """
    if name not in datasets_map:
        raise ValueError('Name of dataset unknown %s' % name)
    return datasets_map[name].get_split(split_name,
                                        dataset_dir,
                                        file_pattern,
                                        reader)

我們這里用到pascalvoc_2012的數(shù)據(jù)，所以當(dāng)返回datasets_map[name].get_split這個代碼時，實(shí)際上調(diào)用的是：

pascalvoc_2012.get_split(split_name,
						dataset_dir,
						file_pattern,
						reader)

在pascalvoc_2012中g(shù)et_split的執(zhí)行過程如下，其中file_pattern = ‘voc_2012_%s_*.tfrecord’，這個名稱是訓(xùn)練的圖片的默認(rèn)名稱，實(shí)際訓(xùn)練的tfrecord文件名稱像這樣voc_2012_train_001.tfrecord，意味著可以讀取這樣的訓(xùn)練文件：

def get_split(split_name, dataset_dir, file_pattern=None, reader=None):
    """Gets a dataset tuple with instructions for reading ImageNet.
    Args:
      split_name: 訓(xùn)練還是測試
      dataset_dir: 數(shù)據(jù)集的位置
      file_pattern: 匹配數(shù)據(jù)集源時要使用的文件模式。
                    假定模式包含一個'%s'字符串，以便可以插入拆分名稱
      reader: TensorFlow閱讀器類型。
    Returns:
      數(shù)據(jù)集.
    """
    if not file_pattern:
        file_pattern = FILE_PATTERN
    return pascalvoc_common.get_split(split_name, dataset_dir,
                                      file_pattern, reader,
                                      SPLITS_TO_SIZES,
                                      ITEMS_TO_DESCRIPTIONS,
                                      NUM_CLASSES)

再進(jìn)入到pascalvoc_common文件后，實(shí)際上就開始對tfrecord的文件進(jìn)行分割了，通過代碼注釋我們了解代碼的執(zhí)行過程，其中tfrecord的文件讀取就是首先按照keys_to_features的內(nèi)容進(jìn)行文件解碼，解碼后的結(jié)果按照items_to_handlers的格式存入數(shù)據(jù)集：

def get_split(split_name, dataset_dir, file_pattern, reader,
              split_to_sizes, items_to_descriptions, num_classes):
    """Gets a dataset tuple with instructions for reading Pascal VOC dataset.
    給定一個數(shù)據(jù)集名和一個拆分名返回一個數(shù)據(jù)集。
    參數(shù):
        name: String, 數(shù)據(jù)集名稱
        split_name: 訓(xùn)練還是測試
        dataset_dir: 存儲數(shù)據(jù)集文件的目錄。
        file_pattern: 用于匹配數(shù)據(jù)集源文件的文件模式。
        reader: tf.readerbase的子類。如果保留為“none”，則使用每個數(shù)據(jù)集定義的默認(rèn)讀取器。
    Returns:
        數(shù)據(jù)集
    """
    if split_name not in split_to_sizes:
        raise ValueError('split name %s was not recognized.' % split_name)
    # file_pattern是取得的tfrecord數(shù)據(jù)集的位置
    file_pattern = os.path.join(dataset_dir, file_pattern % split_name)
    # 當(dāng)沒有的時候使用默認(rèn)reader
    if reader is None:
        reader = tf.TFRecordReader
    # VOC數(shù)據(jù)集中的文檔內(nèi)容
    keys_to_features = {
        'image/encoded': tf.FixedLenFeature((), tf.string, default_value=''),
        'image/format': tf.FixedLenFeature((), tf.string, default_value='jpeg'),
        'image/height': tf.FixedLenFeature([1], tf.int64),
        'image/width': tf.FixedLenFeature([1], tf.int64),
        'image/channels': tf.FixedLenFeature([1], tf.int64),
        'image/shape': tf.FixedLenFeature([3], tf.int64),
        'image/object/bbox/xmin': tf.VarLenFeature(dtype=tf.float32),
        'image/object/bbox/ymin': tf.VarLenFeature(dtype=tf.float32),
        'image/object/bbox/xmax': tf.VarLenFeature(dtype=tf.float32),
        'image/object/bbox/ymax': tf.VarLenFeature(dtype=tf.float32),
        'image/object/bbox/label': tf.VarLenFeature(dtype=tf.int64),
        'image/object/bbox/difficult': tf.VarLenFeature(dtype=tf.int64),
        'image/object/bbox/truncated': tf.VarLenFeature(dtype=tf.int64),
    }
    # 解碼方式
    items_to_handlers = {
        'image': slim.tfexample_decoder.Image('image/encoded', 'image/format'),
        'shape': slim.tfexample_decoder.Tensor('image/shape'),
        'object/bbox': slim.tfexample_decoder.BoundingBox(
                ['ymin', 'xmin', 'ymax', 'xmax'], 'image/object/bbox/'),
        'object/label': slim.tfexample_decoder.Tensor('image/object/bbox/label'),
        'object/difficult': slim.tfexample_decoder.Tensor('image/object/bbox/difficult'),
        'object/truncated': slim.tfexample_decoder.Tensor('image/object/bbox/truncated'),
    }
    # 將tfrecord上keys_to_features的部分解碼到items_to_handlers上
    decoder = slim.tfexample_decoder.TFExampleDecoder(
        keys_to_features, items_to_handlers)
    labels_to_names = None
    if dataset_utils.has_labels(dataset_dir):
        labels_to_names = dataset_utils.read_label_file(dataset_dir)
    return slim.dataset.Dataset(
            data_sources=file_pattern,  # 數(shù)據(jù)源
            reader=reader,              # tf.TFRecordReader
            decoder=decoder,            # 解碼結(jié)果
            num_samples=split_to_sizes[split_name], # 17125
            items_to_descriptions=items_to_descriptions,    # 每一個item的描述
            num_classes=num_classes,                        # 種類
            labels_to_names=labels_to_names)

通過上述一系列操作，實(shí)際上是返回了一個slim.dataset.Dataset數(shù)據(jù)集，而一系列函數(shù)的調(diào)用，實(shí)際上是為了調(diào)用對應(yīng)的數(shù)據(jù)集。

3、建立ssd網(wǎng)絡(luò)。

建立ssd網(wǎng)絡(luò)的過程并不復(fù)雜，沒有許多函數(shù)的調(diào)用，實(shí)際執(zhí)行過程如果了解ssd網(wǎng)絡(luò)的預(yù)測部分就很好理解，我這里只講下邏輯：

1、利用ssd_class = ssd_vgg_300.SSDNet獲得SSDNet的類

2、替換種類的數(shù)量num_classes參數(shù)

3、利用ssd_net = ssd_class(ssd_params)建立網(wǎng)絡(luò)

4、獲得先驗(yàn)框

調(diào)用的代碼如下：

###########################建立ssd網(wǎng)絡(luò)##############################
# 獲得SSD的網(wǎng)絡(luò)和它的先驗(yàn)框
ssd_class = ssd_vgg_300.SSDNet
# 替換種類的數(shù)量num_classes參數(shù)
ssd_params = ssd_class.default_params._replace(num_classes=FLAGS.num_classes)
# 成功建立了網(wǎng)絡(luò)net，替換參數(shù)
ssd_net = ssd_class(ssd_params)
# 獲得先驗(yàn)框
ssd_shape = ssd_net.params.img_shape
ssd_anchors = ssd_net.anchors(ssd_shape) # 包括六個特征層的先驗(yàn)框

4、預(yù)處理數(shù)據(jù)集

預(yù)處理數(shù)據(jù)集的代碼比較長，但是邏輯并不難理解。

1、獲得數(shù)據(jù)集名稱。

2、獲取數(shù)據(jù)集處理的函數(shù)。

3、利用DatasetDataProviders從數(shù)據(jù)集中提供數(shù)據(jù)，進(jìn)行數(shù)據(jù)的預(yù)加載。

4、獲取原始的圖片和它對應(yīng)的label，框ground truth的位置

5、預(yù)處理圖片標(biāo)簽和框的位置

具體實(shí)現(xiàn)的代碼如下：

###########################預(yù)處理數(shù)據(jù)集##############################
# preprocessing_name等于ssd_300_vgg
preprocessing_name = FLAGS.preprocessing_name or FLAGS.model_name
# 根據(jù)名字進(jìn)行處理獲得處理函數(shù)
image_preprocessing_fn = preprocessing_factory.get_preprocessing(
    preprocessing_name, is_training=True)
# 打印參數(shù)
tf_utils.print_configuration(FLAGS.__flags, ssd_params,
                                dataset.data_sources, FLAGS.train_dir)
# DatasetDataProviders從數(shù)據(jù)集中提供數(shù)據(jù). 通過配置，
# 可以同時使用多個readers或者使用單個reader提供數(shù)據(jù)。此外，被讀取的數(shù)據(jù)
# 可以被打亂順序
# 預(yù)加載
with tf.name_scope(FLAGS.dataset_name + '_data_provider'):
    provider = slim.dataset_data_provider.DatasetDataProvider(
        dataset,
        num_readers=FLAGS.num_readers,
        common_queue_capacity=20 * FLAGS.batch_size,
        common_queue_min=10 * FLAGS.batch_size,
        shuffle=True)
# 獲取原始的圖片和它對應(yīng)的label，框ground truth的位置
[image, _, glabels, gbboxes] = provider.get(['image', 'shape',
                                                    'object/label',
                                                    'object/bbox'])
# 預(yù)處理圖片標(biāo)簽和框的位置
image, glabels, gbboxes = \
    image_preprocessing_fn(image, glabels, gbboxes,
                            out_shape=ssd_shape,
                            data_format=DATA_FORMAT)

在這一部分中，可能存在的疑惑的是第二步和第五步，實(shí)際上第五步調(diào)用的就是第二步中的圖像預(yù)處理函數(shù)，所以我們只要看懂第二步“獲取數(shù)據(jù)集處理的函數(shù)“即可。

獲得處理函數(shù)的代碼是：

# 根據(jù)名字進(jìn)行處理獲得處理函數(shù)
image_preprocessing_fn = preprocessing_factory.get_preprocessing(
    preprocessing_name, is_training=True)

preprocessing_factory的文件夾內(nèi)存放的都是圖片處理的代碼，在進(jìn)入到get_preprocessing方法后，實(shí)際上會返回一個preprocessing_fn函數(shù)。

該函數(shù)的作用實(shí)際上是返回ssd_vgg_preprocessing.preprocess_image處理后的結(jié)果。

而ssd_vgg_preprocessing.preprocess_image實(shí)際上是preprocess_for_train處理后的結(jié)果。

preprocessing_factory的get_preprocessing代碼如下：

def get_preprocessing(name, is_training=False):
    preprocessing_fn_map = {
        'ssd_300_vgg': ssd_vgg_preprocessing
    }
    if name not in preprocessing_fn_map:
        raise ValueError('Preprocessing name [%s] was not recognized' % name)
    def preprocessing_fn(image, labels, bboxes,
                         out_shape, data_format='NHWC', **kwargs):
        # 這里實(shí)際上調(diào)用ssd_vgg_preprocessing.preprocess_image
        return preprocessing_fn_map[name].preprocess_image(
            image, labels, bboxes, out_shape, data_format=data_format,
            is_training=is_training, **kwargs)
    return preprocessing_fn

ssd_vgg_preprocessing的preprocess_image代碼如下：

def preprocess_image(image,
                     labels,
                     bboxes,
                     out_shape,
                     data_format,
                     is_training=False,
                     **kwargs):
    """Pre-process an given image.
    Args:
      image: A `Tensor` representing an image of arbitrary size.
      output_height: 預(yù)處理后圖像的高度。
      output_width: 預(yù)處理后圖像的寬度。
      is_training: 如果我們正在對圖像進(jìn)行預(yù)處理以進(jìn)行訓(xùn)練，則為true；否則為false
      resize_side_min: 圖像最小邊的下界，用于保持方向的大小調(diào)整，
                如果“is_training”為“false”，則此值
                用于重新縮放
      resize_side_max: 圖像最小邊的上界，用于保持方向的大小調(diào)整                 
                如果“is_training”為“false”，則此值
                用于重新縮放
                the resize side is sampled from 
                [resize_size_min, resize_size_max].
    Returns:
      預(yù)處理后的圖片
    """
    if is_training:
        return preprocess_for_train(image, labels, bboxes,
                                    out_shape=out_shape,
                                    data_format=data_format)
    else:
        return preprocess_for_eval(image, labels, bboxes,
                                   out_shape=out_shape,
                                   data_format=data_format,
                                   **kwargs)

實(shí)際上最終是通過preprocess_for_train處理數(shù)據(jù)集。

preprocess_for_train處理的過程是：

1、改變數(shù)據(jù)類型。

2、樣本框扭曲。

3、將圖像大小調(diào)整為輸出大小。

4、隨機(jī)水平翻轉(zhuǎn)圖像。

5、隨機(jī)扭曲顏色。有四種方法。

6、圖像減去平均值

執(zhí)行代碼如下：

def preprocess_for_train(image, labels, bboxes,
                         out_shape, data_format='NHWC',
                         scope='ssd_preprocessing_train'):
    """Preprocesses the given image for training.
    Note that the actual resizing scale is sampled from
        [`resize_size_min`, `resize_size_max`].
    參數(shù):
        image: 圖片，任意size的圖片.
        output_height: 處理后的圖片高度.
        output_width: 處理后的圖片寬度.
        resize_side_min: 圖像最小邊的下界，用于保方面調(diào)整大小
        resize_side_max: 圖像最小邊的上界，用于保方面調(diào)整大小
    Returns:
        處理過的圖片
    """
    fast_mode = False
    with tf.name_scope(scope, 'ssd_preprocessing_train', [image, labels, bboxes]):
        if image.get_shape().ndims != 3:
            raise ValueError('Input must be of size [height, width, C>0]')
        # 改變圖片的數(shù)據(jù)類型
        if image.dtype != tf.float32:
            image = tf.image.convert_image_dtype(image, dtype=tf.float32)
        # 樣本框扭曲
        dst_image = image
        dst_image, labels, bboxes, _ = \
            distorted_bounding_box_crop(image, labels, bboxes,
                                        min_object_covered=MIN_OBJECT_COVERED,
                                        aspect_ratio_range=CROP_RATIO_RANGE)
        # 將圖像大小調(diào)整為輸出大小。
        dst_image = tf_image.resize_image(dst_image, out_shape,
                                          method=tf.image.ResizeMethod.BILINEAR,
                                          align_corners=False)
        # 隨機(jī)水平翻轉(zhuǎn)圖像.
        dst_image, bboxes = tf_image.random_flip_left_right(dst_image, bboxes)
        # 隨機(jī)扭曲顏色。有四種方法.
        dst_image = apply_with_random_selector(
                dst_image,
                lambda x, ordering: distort_color(x, ordering, fast_mode),
                num_cases=4)
        # 圖像減去平均值
        image = dst_image * 255.
        image = tf_image_whitened(image, [_R_MEAN, _G_MEAN, _B_MEAN])
        # 圖像的類型
        if data_format == 'NCHW':
            image = tf.transpose(image, perm=(2, 0, 1))
        return image, labels, bboxes

5、框的編碼

該部分利用如下代碼調(diào)用框的編碼代碼：

gclasses, glocalisations, gscores = ssd_net.bboxes_encode(glabels, gbboxes, ssd_anchors)

實(shí)際上bboxes_encode方法中，調(diào)用的是ssd_common模塊中的tf_ssd_bboxes_encode。

def bboxes_encode(self, labels, bboxes, anchors,
                    scope=None):
    """
    進(jìn)行編碼操作
    """
    return ssd_common.tf_ssd_bboxes_encode(
        labels, bboxes, anchors,
        self.params.num_classes,
        self.params.no_annotation_label,
        ignore_threshold=0.5,
        prior_scaling=self.params.prior_scaling,
        scope=scope)

ssd_common.tf_ssd_bboxes_encode執(zhí)行的代碼是對特征層每一層進(jìn)行編碼操作。

def tf_ssd_bboxes_encode(labels,
                         bboxes,
                         anchors,
                         num_classes,
                         no_annotation_label,
                         ignore_threshold=0.5,
                         prior_scaling=[0.1, 0.1, 0.2, 0.2],
                         dtype=tf.float32,
                         scope='ssd_bboxes_encode'):
    """
      對每一個特征層進(jìn)行解碼
    """
    with tf.name_scope(scope):
        target_labels = []
        target_localizations = []
        target_scores = []
        for i, anchors_layer in enumerate(anchors):
            with tf.name_scope('bboxes_encode_block_%i' % i):
                t_labels, t_loc, t_scores = \
                    tf_ssd_bboxes_encode_layer(labels, bboxes, anchors_layer,
                                               num_classes, no_annotation_label,
                                               ignore_threshold,
                                               prior_scaling, dtype)
                target_labels.append(t_labels)
                target_localizations.append(t_loc)
                target_scores.append(t_scores)
        return target_labels, target_localizations, target_scores

實(shí)際上具體解碼的操作在函數(shù)tf_ssd_bboxes_encode_layer里，tf_ssd_bboxes_encode_layer解碼的思路是：

1、創(chuàng)建一系列變量用于存儲編碼結(jié)果。

    yref, xref, href, wref = anchors_layer
    ymin = yref - href / 2.
    xmin = xref - wref / 2.
    ymax = yref + href / 2.
    xmax = xref + wref / 2.
    vol_anchors = (xmax - xmin) * (ymax - ymin)
    # 1、創(chuàng)建一系列變量存儲編碼結(jié)果
    # 每個特征層的shape
    shape = (yref.shape[0], yref.shape[1], href.size)
    # 每個特征層特定點(diǎn)，特定框的label
    feat_labels = tf.zeros(shape, dtype=tf.int64)  # (m, m, k)
    # 每個特征層特定點(diǎn)，特定框的得分
    feat_scores = tf.zeros(shape, dtype=dtype)
    # 每個特征層特定點(diǎn)，特定框的位置
    feat_ymin = tf.zeros(shape, dtype=dtype)
    feat_xmin = tf.zeros(shape, dtype=dtype)
    feat_ymax = tf.ones(shape, dtype=dtype)
    feat_xmax = tf.ones(shape, dtype=dtype)

2、對所有的實(shí)際框都尋找其在特征層中對應(yīng)的點(diǎn)與其對應(yīng)的框，并將其標(biāo)簽找到。

    # 用于計算IOU
    def jaccard_with_anchors(bbox):
        int_ymin = tf.maximum(ymin, bbox[0])  # (m, m, k)
        int_xmin = tf.maximum(xmin, bbox[1])
        int_ymax = tf.minimum(ymax, bbox[2])
        int_xmax = tf.minimum(xmax, bbox[3])
        h = tf.maximum(int_ymax - int_ymin, 0.)
        w = tf.maximum(int_xmax - int_xmin, 0.)
        # Volumes.
        # 處理搜索框和bbox之間的聯(lián)系
        inter_vol = h * w  # 交集面積
        union_vol = vol_anchors - inter_vol \
                    + (bbox[2] - bbox[0]) * (bbox[3] - bbox[1])  # 并集面積
        jaccard = tf.div(inter_vol, union_vol)  # 交集/并集，即IOU
        return jaccard  # (m, m, k)
    def condition(i,feat_labels, feat_scores,
             feat_ymin, feat_xmin, feat_ymax, feat_xmax):
        r = tf.less(i, tf.shape(labels))
        return r[0]
    # 該部分用于尋找實(shí)際中的框?qū)?yīng)特征層的哪個框
    def body(i, feat_labels, feat_scores,
             feat_ymin, feat_xmin, feat_ymax, feat_xmax):
        """
          更新功能標(biāo)簽、分?jǐn)?shù)和bbox。
            -JacCard>0.5時賦值；
        """
        # 取出第i個標(biāo)簽和第i個bboxes
        label = labels[i]  # 當(dāng)前圖片上第i個對象的標(biāo)簽
        bbox = bboxes[i]  # 當(dāng)前圖片上第i個對象的真實(shí)框bbox
        # 計算該box和所有anchor_box的IOU
        jaccard = jaccard_with_anchors(bbox)  # 當(dāng)前對象的bbox和當(dāng)前層的搜索網(wǎng)格IOU
        # 所有高于歷史的分的box被篩選
        mask = tf.greater(jaccard, feat_scores)  # 掩碼矩陣，IOU大于歷史得分的為True
        mask = tf.logical_and(mask, feat_scores > -0.5)
        imask = tf.cast(mask, tf.int64) #[1,0,1,1,0]
        fmask = tf.cast(mask, dtype)    #[1.,0.,1.,0. ... ]
        # Update values using mask.
        # 保證feat_labels存儲對應(yīng)位置得分最大對象標(biāo)簽，feat_scores存儲那個得分
        # (m, m, k) × 當(dāng)前類別 + (1 - (m, m, k)) × (m, m, k)
        # 更新label記錄，此時的imask已經(jīng)保證了True位置當(dāng)前對像得分高于之前的對象得分，其他位置值不變
        # 將所有被認(rèn)為是label的框的值賦予feat_labels
        feat_labels = imask * label + (1 - imask) * feat_labels
        # 用于尋找最匹配的框
        feat_scores = tf.where(mask, jaccard, feat_scores)
        # 下面四個矩陣存儲對應(yīng)label的真實(shí)框坐標(biāo)
        # (m, m, k) × 當(dāng)前框坐標(biāo)scalar + (1 - (m, m, k)) × (m, m, k)
        feat_ymin = fmask * bbox[0] + (1 - fmask) * feat_ymin
        feat_xmin = fmask * bbox[1] + (1 - fmask) * feat_xmin
        feat_ymax = fmask * bbox[2] + (1 - fmask) * feat_ymax
        feat_xmax = fmask * bbox[3] + (1 - fmask) * feat_xmax
        return [i + 1, feat_labels, feat_scores,
                feat_ymin, feat_xmin, feat_ymax, feat_xmax]
    i = 0
    # 2、對所有的實(shí)際框都尋找其在特征層中對應(yīng)的點(diǎn)與其對應(yīng)的框，并將其標(biāo)簽找到。
    (i,feat_labels, feat_scores,feat_ymin, feat_xmin,
     feat_ymax, feat_xmax) = tf.while_loop(condition, body,
                                           [i,
                                            feat_labels, feat_scores,
                                            feat_ymin, feat_xmin,
                                            feat_ymax, feat_xmax])

3、轉(zhuǎn)化成ssd中網(wǎng)絡(luò)的輸出格式。

    # Transform to center / size.
    # 3、轉(zhuǎn)化成ssd中網(wǎng)絡(luò)的輸出格式。
    feat_cy = (feat_ymax + feat_ymin) / 2.
    feat_cx = (feat_xmax + feat_xmin) / 2.
    feat_h = feat_ymax - feat_ymin
    feat_w = feat_xmax - feat_xmin
    # Encode features.
    # 利用公式進(jìn)行計算
    # 以搜索網(wǎng)格中心點(diǎn)為參考，真實(shí)框中心的偏移，單位長度為網(wǎng)格hw
    feat_cy = (feat_cy - yref) / href / prior_scaling[0]
    feat_cx = (feat_cx - xref) / wref / prior_scaling[1]
    # log((m, m, k) / (m, m, 1)) * 5
    # 真實(shí)框?qū)捀?搜索網(wǎng)格寬高，取對
    feat_h = tf.log(feat_h / href) / prior_scaling[2]
    feat_w = tf.log(feat_w / wref) / prior_scaling[3]
    # Use SSD ordering: x / y / w / h instead of ours.(m, m, k, 4)
    feat_localizations = tf.stack([feat_cx, feat_cy, feat_w, feat_h], axis=-1)
    return feat_labels, feat_localizations, feat_scores

真實(shí)情況下的標(biāo)簽和框在編碼完成后，格式與經(jīng)過網(wǎng)絡(luò)預(yù)測出的標(biāo)簽與框相同，此時才可以計算loss進(jìn)行對比。

6、計算loss值

通過第五步獲得的框的編碼后的scores和locations指的是數(shù)據(jù)集標(biāo)注的結(jié)果，是真實(shí)情況。而計算loss值還需要預(yù)測情況。

通過如下代碼可以獲得每個image的預(yù)測情況，將圖片通過網(wǎng)絡(luò)進(jìn)行預(yù)測：

# 設(shè)置SSD網(wǎng)絡(luò)的參數(shù)
arg_scope = ssd_net.arg_scope(weight_decay=FLAGS.weight_decay,
                                data_format=DATA_FORMAT)
# 將圖片經(jīng)過網(wǎng)絡(luò)獲得它們的框的位置和prediction
with slim.arg_scope(arg_scope):
    _, localisations, logits, _ = \
        ssd_net.net(b_image, is_training=True)

再調(diào)用loss計算函數(shù)計算三個loss值，分別對應(yīng)正樣本，負(fù)樣本，定位。

# 計算loss值
n_positives_loss,n_negative_loss,localization_loss = ssd_net.losses(logits, localisations,
                                                        b_gclasses, b_glocalisations, b_gscores,
                                                        match_threshold=FLAGS.match_threshold,
                                                        negative_ratio=FLAGS.negative_ratio,
                                                        alpha=FLAGS.loss_alpha,
                                                        label_smoothing=FLAGS.label_smoothing)
# 會得到三個loss值，分別對應(yīng)正樣本，負(fù)樣本，定位
loss_all = n_positives_loss + n_negative_loss + localization_loss

ssd_net.losses中，具體通過如下方式進(jìn)行損失值的計算。

1、對所有的圖片進(jìn)行鋪平，將其種類預(yù)測的轉(zhuǎn)化為(?,num_classes)，框預(yù)測的格式轉(zhuǎn)化為(?,4)，實(shí)際種類和實(shí)際得分的格式轉(zhuǎn)化為(?)，該步可以便于后面的比較與處理。最后將batch個圖片平鋪到同一表上。

2、在gscores中得到滿足正樣本得分的pmask正樣本，不滿足正樣本得分的為nmask負(fù)樣本，因?yàn)槭褂玫氖莋scores，我們可以知道正樣本負(fù)樣本分類是針對真實(shí)值的。

3、將不滿足正樣本的位置設(shè)成對應(yīng)prediction中背景的得分，其它設(shè)為1。

4、找到n_neg個最不可能為背景的點(diǎn)（實(shí)際上它是背景，這樣利用二者計算的loss就很大）

5、分別計算正樣本、負(fù)樣本、框的位置的交叉熵。

def ssd_losses(logits, localisations,
               gclasses, glocalisations, gscores,
               match_threshold=0.5,
               negative_ratio=3.,
               alpha=1.,
               label_smoothing=0.,
               device='/cpu:0',
               scope=None):
    with tf.name_scope(scope, 'ssd_losses'):
        lshape = tfe.get_shape(logits[0], 5)
        num_classes = lshape[-1]
        batch_size = lshape[0]
        # 鋪平所有vector
        flogits = []
        fgclasses = []
        fgscores = []
        flocalisations = []
        fglocalisations = []
        for i in range(len(logits)): # 按照圖片循環(huán)
            flogits.append(tf.reshape(logits[i], [-1, num_classes]))
            fgclasses.append(tf.reshape(gclasses[i], [-1]))
            fgscores.append(tf.reshape(gscores[i], [-1]))
            flocalisations.append(tf.reshape(localisations[i], [-1, 4]))
            fglocalisations.append(tf.reshape(glocalisations[i], [-1, 4]))
        # 上一步所得的還存在batch個行里面，對應(yīng)batch個圖片
        # 這一步將batch個圖片平鋪到同一表上
        logits = tf.concat(flogits, axis=0)
        gclasses = tf.concat(fgclasses, axis=0)
        gscores = tf.concat(fgscores, axis=0)
        localisations = tf.concat(flocalisations, axis=0)
        glocalisations = tf.concat(fglocalisations, axis=0)
        dtype = logits.dtype
        # gscores中滿足正樣本得分的mask
        pmask = gscores > match_threshold
        fpmask = tf.cast(pmask, dtype)
        no_classes = tf.cast(pmask, tf.int32)
        nmask = tf.logical_and(tf.logical_not(pmask),# IOU達(dá)不到閾值的類別搜索框位置記1
                               gscores > -0.5)
        fnmask = tf.cast(nmask, dtype)
        n_positives = tf.reduce_sum(fpmask)
        # 將預(yù)測結(jié)果轉(zhuǎn)化成比率
        predictions = slim.softmax(logits)
        nvalues = tf.where(nmask,
                           predictions[:, 0],   # 框內(nèi)無物體標(biāo)記為背景預(yù)測概率
                           1. - fnmask)         # 框內(nèi)有物體位置標(biāo)記為1
        nvalues_flat = tf.reshape(nvalues, [-1])
        # max_neg_entries為實(shí)際上負(fù)樣本的個數(shù)
        max_neg_entries = tf.cast(tf.reduce_sum(fnmask), tf.int32)
        # n_neg為正樣本的個數(shù)*3 + batch_size , 之所以+batchsize是因?yàn)槊總€圖最少有一個負(fù)樣本背景
        n_neg = tf.cast(negative_ratio * n_positives, tf.int32) + batch_size
        n_neg = tf.minimum(n_neg, max_neg_entries)
        # 找到n_neg個最不可能為背景的點(diǎn)
        val, idxes = tf.nn.top_k(-nvalues_flat, k=n_neg)
        max_hard_pred = -val[-1]
        # 在nmask找到n_neg個最不可能為背景的點(diǎn)（實(shí)際上它是背景，這樣二者的差就很大）
        nmask = tf.logical_and(nmask, nvalues < max_hard_pred)
        fnmask = tf.cast(nmask, dtype)
        n_negative = tf.reduce_sum(fnmask)
        # 交叉熵
        with tf.name_scope('cross_entropy_pos'):
            loss = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits,
                                                                  labels=gclasses)
            n_positives_loss = tf.div(tf.reduce_sum(loss * fpmask), n_positives + 0.1, name='value')
        with tf.name_scope('cross_entropy_neg'):
            loss = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits,
                                                                  labels=no_classes)
            n_negative_loss = tf.div(tf.reduce_sum(loss * fnmask), n_negative + 0.1, name='value')
        # Add localization loss: smooth L1, L2, ...
        with tf.name_scope('localization'):
            # Weights Tensor: positive mask + random negative.
            weights = tf.expand_dims(alpha * fpmask, axis=-1)
            loss = custom_layers.abs_smooth(localisations - glocalisations)
            localization_loss = tf.div(tf.reduce_sum(loss * weights), n_positives + 0.1, name='value')
        return n_positives_loss,n_negative_loss,localization_loss

7、訓(xùn)練模型并保存

################################優(yōu)化器設(shè)置##############################                                      
learning_rate = tf_utils.configure_learning_rate(FLAGS,
                                                        dataset.num_samples,
                                                        global_step)
optimizer = tf_utils.configure_optimizer(FLAGS, learning_rate)
train_op = slim.learning.create_train_op(loss_all, optimizer,
                                        summarize_gradients=True)
#################################訓(xùn)練并保存模型###########################
gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=FLAGS.gpu_memory_fraction)
config = tf.ConfigProto(log_device_placement=False,
                        gpu_options=gpu_options)
saver = tf.train.Saver(max_to_keep=5,
                        keep_checkpoint_every_n_hours=1.0,
                        write_version=2,
                        pad_step_number=False)
slim.learning.train(
    train_op,			# 優(yōu)化器
    logdir=FLAGS.train_dir,		# 保存模型的地址
    master='',
    is_chief=True,
    init_fn=tf_utils.get_init_fn(FLAGS),	# 微調(diào)已存在模型時，初始化參數(shù)
    number_of_steps=FLAGS.max_number_of_steps,		# 最大步數(shù)
    log_every_n_steps=FLAGS.log_every_n_steps,		# 多少時間進(jìn)行一次命令行輸出
    save_summaries_secs=FLAGS.save_summaries_secs,	# 進(jìn)行一次summary
    saver=saver,
    save_interval_secs=FLAGS.save_interval_secs,	# 多長時間保存一次模型
    session_config=config,
    sync_optimizer=None)

開始訓(xùn)練

在根目錄下創(chuàng)建一個名為train.sh的文件。利用git上的bash執(zhí)行命令行。

首先轉(zhuǎn)到文件夾中。

cd D:/Collection/SSD-Retry

再執(zhí)行train.sh文件。

bash train.sh

train.sh的代碼如下：

DATASET_DIR=./tfrecords
TRAIN_DIR=./logs/
CHECKPOINT_PATH=./checkpoints/ssd_300_vgg.ckpt
python train_demo.py \
    --train_dir=${TRAIN_DIR} \
    --dataset_dir=${DATASET_DIR} \
    --dataset_name=pascalvoc_2012 \
    --dataset_split_name=train \
    --model_name=ssd_300_vgg \
    --checkpoint_path=${CHECKPOINT_PATH} \
    --save_summaries_secs=60 \
    --save_interval_secs=600 \
    --weight_decay=0.0005 \
    --optimizer=adam \
    --learning_rate=0.001 \
    --batch_size=8

訓(xùn)練效果：

以上就是python目標(biāo)檢測SSD算法訓(xùn)練部分源碼詳解的詳細(xì)內(nèi)容，更多關(guān)于python目標(biāo)檢測SSD算法訓(xùn)練的資料請關(guān)注腳本之家其它相關(guān)文章！

您可能感興趣的文章:

欧美bbbwbbbw肥妇,免费乱码人妻系列日韩,一级黄片

軟件下載

源碼下載

軟件編程

網(wǎng)絡(luò)編程

在線工具

數(shù)據(jù)庫

CMS

常用工具

python目標(biāo)檢測SSD算法訓(xùn)練部分源碼詳解

目錄

學(xué)習(xí)前言

講解構(gòu)架

模型訓(xùn)練的流程

1、設(shè)置參數(shù)

2、讀取數(shù)據(jù)集

3、建立ssd網(wǎng)絡(luò)。

4、預(yù)處理數(shù)據(jù)集

5、框的編碼

6、計算loss值

7、訓(xùn)練模型并保存

開始訓(xùn)練

相關(guān)文章

最新評論

大家感興趣的內(nèi)容

最近更新的內(nèi)容

常用在線小工具

python目標(biāo)檢測SSD算法訓(xùn)練部分源碼詳解

目錄

學(xué)習(xí)前言

講解構(gòu)架

模型訓(xùn)練的流程

1、設(shè)置參數(shù)

2、讀取數(shù)據(jù)集

3、建立ssd網(wǎng)絡(luò)。

4、預(yù)處理數(shù)據(jù)集

5、框的編碼

6、計算loss值

7、訓(xùn)練模型并保存

開始訓(xùn)練

相關(guān)文章

最新評論

大家感興趣的內(nèi)容

最近更新的內(nèi)容

常用在線小工具

1、設(shè)置參數(shù)

2、讀取數(shù)據(jù)集

3、建立ssd網(wǎng)絡(luò)。

4、預(yù)處理數(shù)據(jù)集

5、框的編碼

6、計算loss值