Python與CNN的碰撞詳解
AlexNet介紹
AlexNet是2012年ISLVRC 2012(ImageNet Large Scale Visual RecognitionChallenge)競賽的冠軍網絡,分類準確率由傳統(tǒng)的 70%+提升到 80%+。它是由Hinton和他的學生Alex Krizhevsky設計的。也是在那年之后,深度學習開始迅速發(fā)展。
idea
(1)首次利用 GPU 進行網絡加速訓練。
(2)使用了 ReLU 激活函數,而不是傳統(tǒng)的 Sigmoid 激活函數以及 Tanh 激活函數。
(3)使用了 LRN 局部響應歸一化。
(4)在全連接層的前兩層中使用了 Dropout 隨機失活神經元操作,以減少過擬合。
過擬合
根本原因是特征維度過多,模型假設過于復雜,參數過多,訓練數據過少,噪聲過多,導致擬合的函數完美的預測訓練集,但對新數據的測試集預測結果差。 過度的擬合了訓練數據,而沒有考慮到泛化能力。
解決方案
使用 Dropout 的方式在網絡正向傳播過程中隨機失活一部分神經元。
卷積后矩陣尺寸計算公式
經卷積后的矩陣尺寸大小計算公式為: N = (W − F + 2P ) / S + 1
① 輸入圖片大小 W×W
② Filter大小 F×F
③ 步長 S
④ padding的像素數 P
AlexNet網絡結構
layer_name | kernel_size | kernel_num | padding | stride |
Conv1 | 11 | 96 | [1, 2] | 4 |
Maxpool1 | 3 | None | 0 | 2 |
Conv2 | 5 | 256 | [2, 2] | 1 |
Maxpool2 | 3 | None | 0 | 2 |
Conv3 | 3 | 384 | [1, 1] | 1 |
Conv4 | 3 | 384 | [1, 1] | 1 |
Conv5 | 3 | 256 | [1, 1] | 1 |
Maxpool3 | 3 | None | 0 | 2 |
FC1 | 2048 | None | None | None |
FC2 | 2048 | None | None | None |
FC3 | 1000 | None | None | None |
model代碼
from tensorflow.keras import layers, models, Model, Sequential def AlexNet_v1(im_height=224, im_width=224, num_classes=1000): # tensorflow中的tensor通道排序是NHWC input_image = layers.Input(shape=(im_height, im_width, 3), dtype="float32") # output(None, 224, 224, 3) x = layers.ZeroPadding2D(((1, 2), (1, 2)))(input_image) # output(None, 227, 227, 3) x = layers.Conv2D(48, kernel_size=11, strides=4, activation="relu")(x) # output(None, 55, 55, 48) x = layers.MaxPool2D(pool_size=3, strides=2)(x) # output(None, 27, 27, 48) x = layers.Conv2D(128, kernel_size=5, padding="same", activation="relu")(x) # output(None, 27, 27, 128) x = layers.MaxPool2D(pool_size=3, strides=2)(x) # output(None, 13, 13, 128) x = layers.Conv2D(192, kernel_size=3, padding="same", activation="relu")(x) # output(None, 13, 13, 192) x = layers.Conv2D(192, kernel_size=3, padding="same", activation="relu")(x) # output(None, 13, 13, 192) x = layers.Conv2D(128, kernel_size=3, padding="same", activation="relu")(x) # output(None, 13, 13, 128) x = layers.MaxPool2D(pool_size=3, strides=2)(x) # output(None, 6, 6, 128) x = layers.Flatten()(x) # output(None, 6*6*128) x = layers.Dropout(0.2)(x) x = layers.Dense(2048, activation="relu")(x) # output(None, 2048) x = layers.Dropout(0.2)(x) x = layers.Dense(2048, activation="relu")(x) # output(None, 2048) x = layers.Dense(num_classes)(x) # output(None, 5) predict = layers.Softmax()(x) model = models.Model(inputs=input_image, outputs=predict) return model class AlexNet_v2(Model): def __init__(self, num_classes=1000): super(AlexNet_v2, self).__init__() self.features = Sequential([ layers.ZeroPadding2D(((1, 2), (1, 2))), # output(None, 227, 227, 3) layers.Conv2D(48, kernel_size=11, strides=4, activation="relu"), # output(None, 55, 55, 48) layers.MaxPool2D(pool_size=3, strides=2), # output(None, 27, 27, 48) layers.Conv2D(128, kernel_size=5, padding="same", activation="relu"), # output(None, 27, 27, 128) layers.MaxPool2D(pool_size=3, strides=2), # output(None, 13, 13, 128) layers.Conv2D(192, kernel_size=3, padding="same", activation="relu"), # output(None, 13, 13, 192) layers.Conv2D(192, kernel_size=3, padding="same", activation="relu"), # output(None, 13, 13, 192) layers.Conv2D(128, kernel_size=3, padding="same", activation="relu"), # output(None, 13, 13, 128) layers.MaxPool2D(pool_size=3, strides=2)]) # output(None, 6, 6, 128) self.flatten = layers.Flatten() self.classifier = Sequential([ layers.Dropout(0.2), layers.Dense(1024, activation="relu"), # output(None, 2048) layers.Dropout(0.2), layers.Dense(128, activation="relu"), # output(None, 2048) layers.Dense(num_classes), # output(None, 5) layers.Softmax() ]) def call(self, inputs, **kwargs): x = self.features(inputs) x = self.flatten(x) x = self.classifier(x) return x
VGGNet介紹
VGG在2014年由牛津大學著名研究組VGG (Visual Geometry Group) 提出,斬獲該年ImageNet競賽中 Localization Task (定位 任務) 第一名 和 Classification Task (分類任務) 第二名。
idea
通過堆疊多個3x3的卷積核來替代大尺度卷積核 (減少所需參數) 論文中提到,可以通過堆疊兩個3x3的卷 積核替代5x5的卷積核,堆疊三個3x3的 卷積核替代7x7的卷積核。
假設輸入輸出channel為C
7x7卷積核所需參數:7 x 7 x C x C = 49C^2
3x3卷積核所需參數:3 x 3 x C x C + 3 x 3 x C x C + 3 x 3 x C x C = 27C^2
感受野
在卷積神經網絡中,決定某一層輸出 結果中一個元素所對應的輸入層的區(qū)域大 小,被稱作感受野(receptive field)。通俗 的解釋是,輸出feature map上的一個單元 對應輸入層上的區(qū)域大小。
感受野計算公式
F ( i ) =(F ( i + 1) -1) x Stride + Ksize
F(i)為第i層感受野, Stride為第i層的步距, Ksize為卷積核或采樣核尺寸
VGGNet網絡結構
model代碼
from tensorflow.keras import layers, Model, Sequential #import sort_pool2d import tensorflow as tf CONV_KERNEL_INITIALIZER = { 'class_name': 'VarianceScaling', 'config': { 'scale': 2.0, 'mode': 'fan_out', 'distribution': 'truncated_normal' } } DENSE_KERNEL_INITIALIZER = { 'class_name': 'VarianceScaling', 'config': { 'scale': 1. / 3., 'mode': 'fan_out', 'distribution': 'uniform' } } def VGG(feature, im_height=224, im_width=224, num_classes=1000): # tensorflow中的tensor通道排序是NHWC input_image = layers.Input(shape=(im_height, im_width, 3), dtype="float32") x = feature(input_image) x = layers.Flatten()(x) x = layers.Dropout(rate=0.5)(x) x = layers.Dense(2048, activation='relu', kernel_initializer=DENSE_KERNEL_INITIALIZER)(x) x = layers.Dropout(rate=0.5)(x) x = layers.Dense(2048, activation='relu', kernel_initializer=DENSE_KERNEL_INITIALIZER)(x) x = layers.Dense(num_classes, kernel_initializer=DENSE_KERNEL_INITIALIZER)(x) output = layers.Softmax()(x) model = Model(inputs=input_image, outputs=output) return model def make_feature(cfg): feature_layers = [] for v in cfg: if v == "M": feature_layers.append(layers.MaxPool2D(pool_size=2, strides=2)) # elif v == "S": # feature_layers.append(layers.sort_pool2d(x)) else: conv2d = layers.Conv2D(v, kernel_size=3, padding="SAME", activation="relu", kernel_initializer=CONV_KERNEL_INITIALIZER) feature_layers.append(conv2d) return Sequential(feature_layers, name="feature") cfgs = { 'vgg11': [64, 'M', 128, 'M', 256, 256, 'M', 512, 512, 'M', 512, 512, 'M'], 'vgg13': [64, 64, 'M', 128, 128, 'M', 256, 256, 'M', 512, 512, 'M', 512, 512, 'M'], 'vgg16': [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 'M', 512, 512, 512, 'M', 512, 512, 512, 'M'], 'vgg19': [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 256, 'M', 512, 512, 512, 512, 'M', 512, 512, 512, 512, 'M'], } def vgg(model_name="vgg16", im_height=224, im_width=224, num_classes=1000): assert model_name in cfgs.keys(), "not support model {}".format(model_name) cfg = cfgs[model_name] model = VGG(make_feature(cfg), im_height=im_height, im_width=im_width, num_classes=num_classes) return model
到此這篇關于Python與CNN的碰撞詳解的文章就介紹到這了,更多相關Python CNN內容請搜索腳本之家以前的文章或繼續(xù)瀏覽下面的相關文章希望大家以后多多支持腳本之家!