吳恩達(dá)機(jī)器學(xué)習(xí)練習(xí):神經(jīng)網(wǎng)絡(luò)(反向傳播)

更新時(shí)間：2021年04月15日 12:02:20 作者：Cowry5

這篇文章主要介紹了學(xué)習(xí)吳恩達(dá)機(jī)器學(xué)習(xí)中的一個(gè)練習(xí)：神經(jīng)網(wǎng)絡(luò)(反向傳播)，在這個(gè)練習(xí)中，你將實(shí)現(xiàn)反向傳播算法來(lái)學(xué)習(xí)神經(jīng)網(wǎng)絡(luò)的參數(shù),需要的朋友可以參考下

1 Neural Networks 神經(jīng)網(wǎng)絡(luò)

1.1 Visualizing the data 可視化數(shù)據(jù)

這部分我們隨機(jī)選取100個(gè)樣本并可視化。訓(xùn)練集共有5000個(gè)訓(xùn)練樣本，每個(gè)樣本是20*20像素的數(shù)字的灰度圖像。每個(gè)像素代表一個(gè)浮點(diǎn)數(shù)，表示該位置的灰度強(qiáng)度。20×20的像素網(wǎng)格被展開(kāi)成一個(gè)400維的向量。在我們的數(shù)據(jù)矩陣X中，每一個(gè)樣本都變成了一行，這給了我們一個(gè)5000×400矩陣X，每一行都是一個(gè)手寫(xiě)數(shù)字圖像的訓(xùn)練樣本。

import numpy as np
import matplotlib.pyplot as plt
from scipy.io import loadmat
import scipy.optimize as opt
from sklearn.metrics import classification_report  # 這個(gè)包是評(píng)價(jià)報(bào)告

def load_mat(path):
    '''讀取數(shù)據(jù)'''
    data = loadmat('ex4data1.mat')  # return a dict
    X = data['X']
    y = data['y'].flatten()    
    return X, y

def plot_100_images(X):
    """隨機(jī)畫(huà)100個(gè)數(shù)字"""
    index = np.random.choice(range(5000), 100)
    images = X[index]
    fig, ax_array = plt.subplots(10, 10, sharey=True, sharex=True, figsize=(8, 8))
    for r in range(10):
        for c in range(10):
            ax_array[r, c].matshow(images[r*10 + c].reshape(20,20), cmap='gray_r')
    plt.xticks([])
    plt.yticks([])
    plt.show()

X,y = load_mat('ex4data1.mat')
plot_100_images(X)

這里寫(xiě)圖片描述

1.2 Model representation 模型表示

我們的網(wǎng)絡(luò)有三層，輸入層，隱藏層，輸出層。我們的輸入是數(shù)字圖像的像素值，因?yàn)槊總€(gè)數(shù)字的圖像大小為20*20，所以我們輸入層有400個(gè)單元（這里不包括總是輸出要加一個(gè)偏置單元）。

1.2.1 load train data set 讀取數(shù)據(jù)

首先我們要將標(biāo)簽值（1，2，3，4，…，10）轉(zhuǎn)化成非線性相關(guān)的向量，向量對(duì)應(yīng)位置（y[i-1]）上的值等于1，例如y[0]=6轉(zhuǎn)化為y[0]=[0,0,0,0,0,1,0,0,0,0]。

from sklearn.preprocessing import OneHotEncoder
def expand_y(y):
    result = []
    # 把y中每個(gè)類(lèi)別轉(zhuǎn)化為一個(gè)向量，對(duì)應(yīng)的lable值在向量對(duì)應(yīng)位置上置為1
    for i in y:
        y_array = np.zeros(10)
        y_array[i-1] = 1
        result.append(y_array)
    '''
    # 或者用sklearn中OneHotEncoder函數(shù)
    encoder =  OneHotEncoder(sparse=False)  # return a array instead of matrix
    y_onehot = encoder.fit_transform(y.reshape(-1,1))
    return y_onehot
    ''' 
    return np.array(result)

獲取訓(xùn)練數(shù)據(jù)集，以及對(duì)訓(xùn)練集做相應(yīng)的處理，得到我們的input X，lables y。

raw_X, raw_y = load_mat('ex4data1.mat')
X = np.insert(raw_X, 0, 1, axis=1)
y = expand_y(raw_y)
X.shape, y.shape
'''
((5000, 401), (5000, 10))
'''
.csdn.net/Cowry5/article/details/80399350

1.2.2 load weight 讀取權(quán)重

這里我們提供了已經(jīng)訓(xùn)練好的參數(shù)θ1，θ2，存儲(chǔ)在ex4weight.mat文件中。這些參數(shù)的維度由神經(jīng)網(wǎng)絡(luò)的大小決定，第二層有25個(gè)單元，輸出層有10個(gè)單元(對(duì)應(yīng)10個(gè)數(shù)字類(lèi))。

def load_weight(path):
    data = loadmat(path)
    return data['Theta1'], data['Theta2']

t1, t2 = load_weight('ex4weights.mat')
t1.shape, t2.shape
# ((25, 401), (10, 26))

1.2.3 展開(kāi)參數(shù)

當(dāng)我們使用高級(jí)優(yōu)化方法來(lái)優(yōu)化神經(jīng)網(wǎng)絡(luò)時(shí)，我們需要將多個(gè)參數(shù)矩陣展開(kāi)，才能傳入優(yōu)化函數(shù)，然后再恢復(fù)形狀。

def serialize(a, b):
    '''展開(kāi)參數(shù)'''
    return np.r_[a.flatten(),b.flatten()]

theta = serialize(t1, t2)  # 扁平化參數(shù)，25*401+10*26=10285
theta.shape  # (10285,)

def deserialize(seq):
    '''提取參數(shù)'''
    return seq[:25*401].reshape(25, 401), seq[25*401:].reshape(10, 26)

1.3 Feedforward and cost function 前饋和代價(jià)函數(shù) 1.3.1 Feedforward

確保每層的單元數(shù)，注意輸出時(shí)加一個(gè)偏置單元，s(1)=400+1，s(2)=25+1，s(3)=10。

def sigmoid(z):
    return 1 / (1 + np.exp(-z))

def feed_forward(theta, X,):
    '''得到每層的輸入和輸出'''
    t1, t2 = deserialize(theta)
    # 前面已經(jīng)插入過(guò)偏置單元，這里就不用插入了
    a1 = X
    z2 = a1 @ t1.T
    a2 = np.insert(sigmoid(z2), 0, 1, axis=1)
    z3 = a2 @ t2.T
    a3 = sigmoid(z3)
    return a1, z2, a2, z3, a3

a1, z2, a2, z3, h = feed_forward(theta, X)

1.3.2 Cost function

回顧下神經(jīng)網(wǎng)絡(luò)的代價(jià)函數(shù)（不帶正則化項(xiàng)）

輸出層輸出的是對(duì)樣本的預(yù)測(cè)，包含5000個(gè)數(shù)據(jù)，每個(gè)數(shù)據(jù)對(duì)應(yīng)了一個(gè)包含10個(gè)元素的向量，代表了結(jié)果有10類(lèi)。在公式中，每個(gè)元素與log項(xiàng)對(duì)應(yīng)相乘。

最后我們使用提供訓(xùn)練好的參數(shù)θ，算出的cost應(yīng)該為0.287629

def cost(theta, X, y):
    a1, z2, a2, z3, h = feed_forward(theta, X)
    J = 0
    for i in range(len(X)):
        first = - y[i] * np.log(h[i])
        second = (1 - y[i]) * np.log(1 - h[i])
        J = J + np.sum(first - second)
    J = J / len(X)
    return J
'''
     # or just use verctorization
     J = - y * np.log(h) - (1 - y) * np.log(1 - h)
     return J.sum() / len(X)
'''

cost(theta, X, y) # 0.2876291651613189

1.4 Regularized cost function 正則化代價(jià)函數(shù)

注意不要將每層的偏置項(xiàng)正則化。

最后You should see that the cost is about 0.383770

def regularized_cost(theta, X, y, l=1):
    '''正則化時(shí)忽略每層的偏置項(xiàng)，也就是參數(shù)矩陣的第一列'''
    t1, t2 = deserialize(theta)
    reg = np.sum(t1[:,1:] ** 2) + np.sum(t2[:,1:] ** 2)  # or use np.power(a, 2)
    return l / (2 * len(X)) * reg + cost(theta, X, y)

regularized_cost(theta, X, y, 1) # 0.38376985909092354

2 Backpropagation 反向傳播

2.1 Sigmoid gradient S函數(shù)導(dǎo)數(shù)

這里可以手動(dòng)推導(dǎo)，并不難。

def sigmoid_gradient(z):
    return sigmoid(z) * (1 - sigmoid(z))

2.2 Random initialization 隨機(jī)初始化

當(dāng)我們訓(xùn)練神經(jīng)網(wǎng)絡(luò)時(shí)，隨機(jī)初始化參數(shù)是很重要的，可以打破數(shù)據(jù)的對(duì)稱性。一個(gè)有效的策略是在均勻分布(−e，e)中隨機(jī)選擇值，我們可以選擇 e = 0.12 這個(gè)范圍的值來(lái)確保參數(shù)足夠小，使得訓(xùn)練更有效率。

def random_init(size):
    '''從服從的均勻分布的范圍中隨機(jī)返回size大小的值'''
    return np.random.uniform(-0.12, 0.12, size)

2.3 Backpropagation 反向傳播

目標(biāo)：獲取整個(gè)網(wǎng)絡(luò)代價(jià)函數(shù)的梯度。以便在優(yōu)化算法中求解。

這里面一定要理解正向傳播和反向傳播的過(guò)程，才能弄清楚各種參數(shù)在網(wǎng)絡(luò)中的維度，切記。比如手寫(xiě)出每次傳播的式子。

print('a1', a1.shape,'t1', t1.shape)
print('z2', z2.shape)
print('a2', a2.shape, 't2', t2.shape)
print('z3', z3.shape)
print('a3', h.shape)
'''
a1 (5000, 401) t1 (25, 401)
z2 (5000, 25)
a2 (5000, 26) t2 (10, 26)
z3 (5000, 10)
a3 (5000, 10)
'''

def gradient(theta, X, y):
    '''
    unregularized gradient, notice no d1 since the input layer has no error 
    return 所有參數(shù)theta的梯度，故梯度D(i)和參數(shù)theta(i)同shape，重要。
    '''
    t1, t2 = deserialize(theta)
    a1, z2, a2, z3, h = feed_forward(theta, X)
    d3 = h - y # (5000, 10)
    d2 = d3 @ t2[:,1:] * sigmoid_gradient(z2)  # (5000, 25)
    D2 = d3.T @ a2  # (10, 26)
    D1 = d2.T @ a1 # (25, 401)
    D = (1 / len(X)) * serialize(D1, D2)  # (10285,)
    return D

2.4 Gradient checking 梯度檢測(cè)

在你的神經(jīng)網(wǎng)絡(luò),你是最小化代價(jià)函數(shù)J(Θ)。執(zhí)行梯度檢查你的參數(shù),你可以想象展開(kāi)參數(shù)Θ(1)Θ(2)成一個(gè)長(zhǎng)向量θ。通過(guò)這樣做,你能使用以下梯度檢查過(guò)程。

def gradient_checking(theta, X, y, e):
    def a_numeric_grad(plus, minus):
        """
        對(duì)每個(gè)參數(shù)theta_i計(jì)算數(shù)值梯度，即理論梯度。
        """
        return (regularized_cost(plus, X, y) - regularized_cost(minus, X, y)) / (e * 2)
    numeric_grad = [] 
    for i in range(len(theta)):
        plus = theta.copy()  # deep copy otherwise you will change the raw theta
        minus = theta.copy()
        plus[i] = plus[i] + e
        minus[i] = minus[i] - e
        grad_i = a_numeric_grad(plus, minus)
        numeric_grad.append(grad_i)
    numeric_grad = np.array(numeric_grad)
    analytic_grad = regularized_gradient(theta, X, y)
    diff = np.linalg.norm(numeric_grad - analytic_grad) / np.linalg.norm(numeric_grad + analytic_grad)
    print('If your backpropagation implementation is correct,\nthe relative difference will be smaller than 10e-9 (assume epsilon=0.0001).\nRelative Difference: {}\n'.format(diff))

gradient_checking(theta, X, y, epsilon= 0.0001)#這個(gè)運(yùn)行很慢，謹(jǐn)慎運(yùn)行

2.5 Regularized Neural Networks 正則化神經(jīng)網(wǎng)絡(luò)

def regularized_gradient(theta, X, y, l=1):
    """不懲罰偏置單元的參數(shù)"""
    a1, z2, a2, z3, h = feed_forward(theta, X)
    D1, D2 = deserialize(gradient(theta, X, y))
    t1[:,0] = 0
    t2[:,0] = 0
    reg_D1 = D1 + (l / len(X)) * t1
    reg_D2 = D2 + (l / len(X)) * t2
    return serialize(reg_D1, reg_D2)

2.6 Learning parameters using fmincg 優(yōu)化參數(shù)

def nn_training(X, y):
    init_theta = random_init(10285)  # 25*401 + 10*26
    res = opt.minimize(fun=regularized_cost,
                       x0=init_theta,
                       args=(X, y, 1),
                       method='TNC',
                       jac=regularized_gradient,
                       options={'maxiter': 400})
    return res

res = nn_training(X, y)#慢
res
'''
     fun: 0.5156784004838036
     jac: array([-2.51032294e-04, -2.11248326e-12,  4.38829369e-13, ...,
        9.88299811e-05, -2.59923586e-03, -8.52351187e-04])
 message: 'Converged (|f_n-f_(n-1)| ~= 0)'
    nfev: 271
     nit: 17
  status: 1
 success: True
       x: array([ 0.58440213, -0.02013683,  0.1118854 , ..., -2.8959637 ,
        1.85893941, -2.78756836])
'''

def accuracy(theta, X, y):
    _, _, _, _, h = feed_forward(res.x, X)
    y_pred = np.argmax(h, axis=1) + 1
    print(classification_report(y, y_pred))

accuracy(res.x, X, raw_y)
'''
             precision    recall  f1-score   support
          1       0.97      0.99      0.98       500
          2       0.98      0.97      0.98       500
          3       0.98      0.95      0.96       500
          4       0.98      0.97      0.97       500
          5       0.97      0.98      0.97       500
          6       0.99      0.98      0.98       500
          7       0.99      0.97      0.98       500
          8       0.96      0.98      0.97       500
          9       0.97      0.98      0.97       500
         10       0.99      0.99      0.99       500
avg / total       0.98      0.98      0.98      5000
'''

3 Visualizing the hidden layer 可視化隱藏層

理解神經(jīng)網(wǎng)絡(luò)是如何學(xué)習(xí)的一個(gè)很好的辦法是，可視化隱藏層單元所捕獲的內(nèi)容。通俗的說(shuō)，給定一個(gè)的隱藏層單元，可視化它所計(jì)算的內(nèi)容的方法是找到一個(gè)輸入x，x可以激活這個(gè)單元（也就是說(shuō)有一個(gè)激活值接近與1）。對(duì)于我們所訓(xùn)練的網(wǎng)絡(luò)，注意到θ1中每一行都是一個(gè)401維的向量，代表每個(gè)隱藏層單元的參數(shù)。如果我們忽略偏置項(xiàng)，我們就能得到400維的向量，這個(gè)向量代表每個(gè)樣本輸入到每個(gè)隱層單元的像素的權(quán)重。因此可視化的一個(gè)方法是，reshape這個(gè)400維的向量為（20，20）的圖像然后輸出。

注：

It turns out that this is equivalent to finding the input that gives the highest activation for the hidden unit, given a norm constraint on the input.

這相當(dāng)于找到了一個(gè)輸入，給了隱層單元最高的激活值，給定了一個(gè)輸入的標(biāo)準(zhǔn)限制。例如(||x||2≤1)

(這部分暫時(shí)不太理解)

def plot_hidden(theta):
    t1, _ = deserialize(theta)
    t1 = t1[:, 1:]
    fig,ax_array = plt.subplots(5, 5, sharex=True, sharey=True, figsize=(6,6))
    for r in range(5):
        for c in range(5):
            ax_array[r, c].matshow(t1[r * 5 + c].reshape(20, 20), cmap='gray_r')
            plt.xticks([])
            plt.yticks([])
    plt.show()

plot_hidden(res.x)

這里寫(xiě)圖片描述

到此在這篇練習(xí)中，你將學(xué)習(xí)如何用反向傳播算法來(lái)學(xué)習(xí)神經(jīng)網(wǎng)絡(luò)的參數(shù)，更多相關(guān)機(jī)器學(xué)習(xí)，神經(jīng)網(wǎng)絡(luò)內(nèi)容請(qǐng)搜索腳本之家以前的文章或繼續(xù)瀏覽下面的相關(guān)文章，希望大家以后多多支持腳本之家！

您可能感興趣的文章: