Python基于Tensorflow2.X實(shí)現(xiàn)汽車油耗預(yù)測(cè)

更新時(shí)間：2023年02月08日 09:26:03 作者：嘟粥yyds

這篇文章主要為大家詳細(xì)介紹了Python基于Tensorflow2.X實(shí)現(xiàn)汽車油耗預(yù)測(cè)的相關(guān)方法，文中的示例代碼講解詳細(xì)，感興趣的小伙伴可以跟隨小編一起學(xué)習(xí)一下

一、開發(fā)環(huán)境

集成開發(fā)工具：jupyter notebook 6.5.2

集成開發(fā)環(huán)境：Python 3.10.6

第三方庫(kù)：tensorflow-gpu、numpy、matplotlib.pyplot、pandas

二、代碼實(shí)現(xiàn)

2.1 準(zhǔn)備操作

2.1.1 導(dǎo)入所需模塊

tensorflow若非GPU版本也可

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers, datasets, losses
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

2.1.2 matplotlib無(wú)法正常顯示中文的解決方案（若無(wú)此情況可跳過(guò)）

博主由于已在matplotlib的配置文件中做了一系列修改，故可以正常顯示中文，若出現(xiàn)無(wú)法正常顯示中文的情況有以下兩種解決方案：

1、在導(dǎo)入模塊后加上下列兩行代碼（每次需要正常顯示中文時(shí)都需要加）：

plt.rcParams['font.sans-serif'] = ['SimHei']  # 用來(lái)正常顯示中文標(biāo)簽
plt.rcParams['axes.unicode_minus'] = False  # 用來(lái)正常顯示負(fù)號(hào)

2、在matplotlib的源文件中做了一系列修改（一勞永逸）

運(yùn)行下列代碼

import matplotlib
print(matplotlib.matplotlib_fname())

會(huì)輸出配置文件路徑，如：（進(jìn)入python配置目錄因人而異，只需從\LocalCache之后開始關(guān)注）

C:\Users\31600\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\matplotlib\mpl-data\matplotlibrc

打開此文件，找到#font.family:和#font.sans-serif:開頭的這兩行，將兩行的注釋#去掉，并在font.sans-serif:后添加自己想加入的中文字體名，如：（博主選用的打開方式為visual studio2017，不同的打開方式顯示可能略有不同另博主打開演示前已完成去注釋操作）

最后保存文件，重新運(yùn)行python環(huán)境即可，不必在代碼中做出任何修改。

2.2 加載數(shù)據(jù)集

# 在線下載汽車效能數(shù)據(jù)集
dataset_path = keras.utils.get_file("auto-mpg.data", 
"http://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data")
# 字段有效能（公里數(shù)每加侖） 氣缸數(shù) 排量 馬力 重量 加速度 型號(hào)年份 產(chǎn)地
column_names = ['MPG','Cylinders','Displacement','Horsepower','Weight',
 'Acceleration', 'Model Year', 'Origin']
raw_dataset = pd.read_csv(dataset_path, names=column_names,
 na_values = "?", comment='\t',
 sep=" ", skipinitialspace=True)

各字段含義

dataset = raw_dataset.copy()
# 查看部分?jǐn)?shù)據(jù)
dataset.head(8)

2.3 數(shù)據(jù)處理

2.3.1 數(shù)據(jù)清洗

原始數(shù)據(jù)可能含有空字段（缺失值）的數(shù)據(jù)項(xiàng)

isna()會(huì)返回為包含無(wú)效值的表單，再追加一個(gè)sum()實(shí)現(xiàn)計(jì)算無(wú)效值的和；其次，對(duì)無(wú)效數(shù)據(jù)進(jìn)行處理；dropna()清除無(wú)效數(shù)據(jù)后返回一個(gè)新數(shù)據(jù)表單；

begin = dataset.isna().sum()  # 統(tǒng)計(jì)空白數(shù)據(jù)
dataset = dataset.dropna()  # 刪除空白數(shù)據(jù)項(xiàng)
end = dataset.isna().sum()  # 再次統(tǒng)計(jì)空白數(shù)據(jù)
print(f'========統(tǒng)計(jì)前:========\n{begin}\n========統(tǒng)計(jì)后:========\n{end}')
# 輸出結(jié)果:
========統(tǒng)計(jì)前:========
MPG             0
Cylinders       0
Displacement    0
Horsepower      6
Weight          0
Acceleration    0
Model Year      0
Origin          0
dtype: int64
========統(tǒng)計(jì)后:========
MPG             0
Cylinders       0
Displacement    0
Horsepower      0
Weight          0
Acceleration    0
Model Year      0
Origin          0
dtype: int64

2.3.2 數(shù)據(jù)轉(zhuǎn)換

由于Origin字段為類型數(shù)據(jù)，我們將其移除，并轉(zhuǎn)換為新的3個(gè)字段：USA、Europe和Japan。（原數(shù)據(jù)里分別對(duì)應(yīng)1、2、3）

將需要的產(chǎn)源地分類序號(hào)從表中提取出來(lái)，使用pop(‘Origin’)提取Origin列的數(shù)據(jù)，將Origin進(jìn)行one-hot轉(zhuǎn)換——即Origin不同的值僅對(duì)應(yīng)一個(gè)數(shù)據(jù)有效；

# 先彈出（刪除并返回）Origin這一列
origin = dataset.pop('Origin')
# 根據(jù)origin列的數(shù)據(jù)來(lái)寫入新的3個(gè)列
dataset['USA'] = (origin == 1) * 1.0
dataset['Europe'] = (origin == 2) * 1.0
dataset['Japan'] = (origin == 3) * 1.0
dataset.tail()

轉(zhuǎn)換后的數(shù)據(jù)為：

2.3.3 數(shù)據(jù)集劃分

利用sample()函數(shù)將數(shù)據(jù)集拆分為0.8的數(shù)據(jù)集和0.2的測(cè)試集，并且移除已經(jīng)取出的數(shù)據(jù)下標(biāo)包含的數(shù)據(jù)并返回作為測(cè)試集

# 切分為訓(xùn)練集和測(cè)試集
train_dataset = dataset.sample(frac=0.8, random_state=0)
test_dataset = dataset.drop(train_dataset.index)

describe()用于觀察一系列數(shù)據(jù)的范圍，統(tǒng)計(jì)訓(xùn)練集的各個(gè)字段數(shù)值的均值和標(biāo)準(zhǔn)差，大小、波動(dòng)趨勢(shì)等。從獲得的數(shù)據(jù)中獲得訓(xùn)練數(shù)據(jù)的標(biāo)簽，即移除的MPG列數(shù)據(jù)

# 查看訓(xùn)練集的輸入x的統(tǒng)計(jì)數(shù)據(jù)
train_stats = train_dataset.describe()
train_stats = train_stats.transpose()  # 轉(zhuǎn)置
train_stats

2.3.4 真實(shí)值設(shè)定

將MPG字段移除為標(biāo)簽數(shù)據(jù)

# 移動(dòng)MPG油耗效能這一列為真實(shí)標(biāo)簽Y
train_labels = train_dataset.pop('MPG')
test_labels = test_dataset.pop('MPG')

2.3.5 標(biāo)準(zhǔn)化數(shù)據(jù)

統(tǒng)計(jì)訓(xùn)練集的各個(gè)字段數(shù)值的均值和標(biāo)準(zhǔn)差，并完成數(shù)據(jù)的標(biāo)準(zhǔn)化

# 標(biāo)準(zhǔn)化數(shù)據(jù)
def norm(x):
    # 減去每個(gè)字段的均值并除以標(biāo)準(zhǔn)差
    return (x - train_stats['mean']) / train_stats['std']
 
normed_train_data = norm(train_dataset)  # 標(biāo)準(zhǔn)化訓(xùn)練集
normed_test_data = norm(test_dataset)  # 標(biāo)準(zhǔn)化測(cè)試集
print(f'訓(xùn)練集大小:{normed_train_data.shape}  {train_labels.shape}')
print(f"測(cè)試集大小:{normed_test_data.shape}  {test_labels.shape}")
normed_train_data  # 查看數(shù)據(jù)處理后的訓(xùn)練集
# 利用切分的訓(xùn)練集構(gòu)建數(shù)據(jù)集對(duì)象
train_db = tf.data.Dataset.from_tensor_slices((normed_train_data.values, train_labels.values))  # 構(gòu)建Dataset對(duì)象
# 為防止數(shù)據(jù)標(biāo)準(zhǔn)化后出現(xiàn)梯度彌散，切分訓(xùn)練集，分為幾個(gè)batch，加速計(jì)算
train_db = train_db.shuffle(100).batch(32)  # 隨機(jī)打散，批量化
 
# 輸出結(jié)果:
訓(xùn)練集大小:(314, 9)  (314,)
測(cè)試集大小:(78, 9)  (78,)

2.3.6 查看各字段對(duì)MPG的影響

# 創(chuàng)建同y軸的上下三聯(lián)圖，需要plt.subplots()
fig, axes = plt.subplots(nrows=1,ncols=3,sharey=True,figsize=(20,6)) 
fig.suptitle('各個(gè)字段對(duì)MPG的影響', fontsize=16)
#subplot1
axes[0].scatter(train_dataset['Cylinders'].to_numpy(), train_labels.to_numpy(), color='b',s=20)
axes[0].set_xlabel('Cylinders', fontsize=13)
axes[0].set_ylabel('MPG', fontsize=14)
#subplot2
axes[1].scatter(train_dataset['Displacement'].to_numpy(), train_labels.to_numpy(), color='b',s=20)
axes[1].set_xlabel('Displacement', fontsize=13)
#subplot3
axes[2].scatter(train_dataset['Weight'].to_numpy(), train_labels.to_numpy(), color='b',s=20)
axes[2].set_xlabel('Weight', fontsize=13)
 
#展示圖片
plt.show()

2.4 創(chuàng)建網(wǎng)絡(luò)

因?yàn)樵摂?shù)據(jù)集比較小，我們只創(chuàng)建一個(gè)3層的全連接網(wǎng)絡(luò)來(lái)完成MPG值的預(yù)測(cè)任務(wù)。我們將網(wǎng)絡(luò)實(shí)現(xiàn)為一個(gè)自定義網(wǎng)絡(luò)類，只需要在初始化函數(shù)中創(chuàng)建各個(gè)子網(wǎng)絡(luò)層，并在前向計(jì)算函數(shù) call 中實(shí)現(xiàn)自定義網(wǎng)絡(luò)類的計(jì)算邏輯即可。自定義網(wǎng)絡(luò)類繼承自 keras.Model 基類，這也是自定義網(wǎng)絡(luò)類的標(biāo)準(zhǔn)寫法，以方便地利用 keras.Model 基類提供的 trainable_variables、save_weights 等各種便捷功能。

class Network(keras.Model):
    # 回歸網(wǎng)絡(luò)模型
    def __init__(self):
        super(Network, self).__init__()
        # 創(chuàng)建3個(gè)全連接層
        self.fc1 = layers.Dense(64, activation='relu', name='Layer1')
        self.fc2 = layers.Dense(64, activation='relu', name='Layer2')
        self.fc3 = layers.Dense(1, name='OutputLayer')
        
    def call(self, inputs, training=None, mask=None):
        # 依次通過(guò)3個(gè)全連接層
        x = self.fc1(inputs)
        x = self.fc2(x)
        x = self.fc3(x)
        return x

2.5 訓(xùn)練與測(cè)試

在訓(xùn)練網(wǎng)絡(luò)時(shí)，一般的流程是通過(guò)前向計(jì)算獲得網(wǎng)絡(luò)的輸出值，再通過(guò)損失函數(shù)計(jì)算網(wǎng)絡(luò)誤差，然后通過(guò)自動(dòng)求導(dǎo)工具計(jì)算梯度并更新，同時(shí)間隔性地測(cè)試網(wǎng)絡(luò)的性能。

所以，在完成網(wǎng)絡(luò)模型的搭建后，需要指定網(wǎng)絡(luò)使用的優(yōu)化器對(duì)象、損失函數(shù)類型，評(píng)價(jià)指標(biāo)等設(shè)定，這一步稱為裝配。這里只指定網(wǎng)絡(luò)使用的優(yōu)化器對(duì)象，損失函數(shù)在梯度求導(dǎo)時(shí)在指定

# 創(chuàng)建網(wǎng)絡(luò)類實(shí)例
model = Network()
# 通過(guò)build函數(shù)完成內(nèi)部張量的創(chuàng)建（其中4為任意設(shè)置的batch量，9為輸入特征長(zhǎng)度）
model.build(input_shape=(4, 9))
# 打印網(wǎng)絡(luò)信息
model.summary()
# 創(chuàng)建優(yōu)化器，指定學(xué)習(xí)率為0.001
optimizers = tf.keras.optimizers.RMSprop(0.001)

# 接下來(lái)實(shí)現(xiàn)網(wǎng)絡(luò)訓(xùn)練部分。通過(guò) Epoch 和 Step 組成的雙層循環(huán)訓(xùn)練網(wǎng)絡(luò)，共訓(xùn)練 200 個(gè) Epoch
Epoch =  np.arange(0, 200)
train_MAE = np.zeros(200)
test_MAE = np.zeros(200)
 
for epoch in range(200):
    for step, (x, y) in enumerate(train_db):  # 遍歷依次訓(xùn)練集
        # 梯度記錄器
        with tf.GradientTape() as tape:
            out = model(x)  # 通過(guò)網(wǎng)絡(luò)獲得輸出
            loss = tf.reduce_mean(losses.MSE(y, out))
            mae_loss = tf.reduce_mean(losses.MAE(y, out)) # 計(jì)算 MAE
            # 計(jì)算梯度并更新
            grads = tape.gradient(loss, model.trainable_variables)
            Optimizers.apply_gradients(zip(grads, model.trainable_variables))
    train_MAE[epoch] = float(mae_loss)
    out = model(tf.constant(normed_test_data.values))
    test_MAE[epoch] = tf.reduce_mean(losses.MAE(test_labels, out))

2.6 誤差結(jié)果可視化

對(duì)于回歸問(wèn)題，除了 MSE 均方差可以用來(lái)模型的測(cè)試性能，還可以用平均絕對(duì)誤差(Mean Absolute Error，簡(jiǎn)稱 MAE)來(lái)衡量模型的性能，程序運(yùn)算時(shí)記錄每個(gè) Epoch 結(jié)束時(shí)的訓(xùn)練和測(cè)試 MAE 數(shù)據(jù)，并繪制變化曲線

plt.xlabel('Epoch')
plt.ylabel('MAE')
plt.plot(Epoch, train_MAE, label="Train")
plt.plot(Epoch, test_MAE, label="Test")
plt.title('汽車油耗實(shí)戰(zhàn)')
plt.legend(['Train', 'Test'])  # 設(shè)置折線名稱
plt.show()