快捷導(dǎo)航

python中sklearn的pipeline模塊實(shí)例詳解

更新時(shí)間：2020年05月21日 10:17:38 作者：易晴天

這篇文章主要介紹了python中sklearn的pipeline模塊的相關(guān)知識(shí)，本文通過(guò)實(shí)例代碼給大家介紹的非常詳細(xì)，對(duì)大家的學(xué)習(xí)或工作具有一定的參考借鑒價(jià)值,需要的朋友可以參考下

最近在看《深度學(xué)習(xí)：基于Keras的Python實(shí)踐（魏貞原）》這本書，書中8.3創(chuàng)建了一個(gè)Scikit-Learn的Pipeline，首先標(biāo)準(zhǔn)化數(shù)據(jù)集，然后創(chuàng)建和評(píng)估基線神經(jīng)網(wǎng)絡(luò)模型，代碼如下：

# 數(shù)據(jù)正態(tài)化，改進(jìn)算法
steps = []
steps.append(('standardize', StandardScaler()))
steps.append(('mlp', model))
pipeline = Pipeline(steps)
kfold = KFold(n_splits=10, shuffle=True, random_state=seed)
results = cross_val_score(pipeline, x, Y, cv=kfold)
print('Standardize: %.2f (%.2f) MSE' % (results.mean(), results.std()))

而PipeLine是什么來(lái)的呢？

Pipelines and composite estimators（官方文檔）

轉(zhuǎn)換器通常與分類器，回歸器或其他估計(jì)器組合在一起，以構(gòu)建復(fù)合估計(jì)器。最常用的工具是Pipeline。Pipeline通常與FeatureUnion結(jié)合使用，F(xiàn)eatureUnion將轉(zhuǎn)換器的輸出連接到一個(gè)復(fù)合特征空間中。 TransformedTargetRegressor處理轉(zhuǎn)換目標(biāo)（即對(duì)數(shù)變換y）。相反，Pipelines僅轉(zhuǎn)換觀察到的數(shù)據(jù)（X）。

Pipeline可用于將多個(gè)估計(jì)器鏈接為一個(gè)。這很有用，因?yàn)樵谔幚頂?shù)據(jù)時(shí)通常會(huì)有固定的步驟順序，例如特征選擇，歸一化和分類。Pipeline在這里有多種用途：

方便和封裝：只需調(diào)用一次fit并在數(shù)據(jù)上進(jìn)行一次predict即可擬合整個(gè)估計(jì)器序列。
聯(lián)合參數(shù)選擇：可以一次對(duì)Pipeline中所有估計(jì)器的參數(shù)進(jìn)行網(wǎng)格搜索（grid search ）。
安全性：通過(guò)確保使用相同的樣本來(lái)訓(xùn)練轉(zhuǎn)換器和預(yù)測(cè)器，Pipeline有助于避免在交叉驗(yàn)證中將測(cè)試數(shù)據(jù)的統(tǒng)計(jì)信息泄漏到經(jīng)過(guò)訓(xùn)練的模型中。

Pipeline是使用 （key，value） 對(duì)的列表構(gòu)建的，其中key是包含要提供此步驟名稱的字符串，而value是一個(gè)估計(jì)器對(duì)象：

from sklearn.pipeline import Pipeline
from sklearn.svm import SVC
from sklearn.decomposition import PCA
estimators = [('reduce_dim', PCA()), ('clf', SVC())]
pipe = Pipeline(estimators)
pipe

output:

在這里插入圖片描述

函數(shù)make_pipeline是構(gòu)建pipelines的簡(jiǎn)寫;它接受不同數(shù)量的估計(jì)器，并返回一個(gè)pipeline。它不需要也不允許命名估計(jì)器。而是將其名稱自動(dòng)設(shè)置為其類型的小寫字母：

from sklearn.pipeline import make_pipeline
from sklearn.naive_bayes import MultinomialNB
from sklearn.preprocessing import Binarizer
make_pipeline(Binarizer(), MultinomialNB())

output:

在這里插入圖片描述