快捷導(dǎo)航

利用python中的matplotlib打印混淆矩陣實(shí)例

更新時(shí)間：2020年06月16日 09:46:55 作者：Kun Li

這篇文章主要介紹了利用python中的matplotlib打印混淆矩陣實(shí)例，具有很好的參考價(jià)值，希望對(duì)大家有所幫助。一起跟隨小編過(guò)來(lái)看看吧

前面說(shuō)過(guò)混淆矩陣是我們?cè)谔幚矸诸?lèi)問(wèn)題時(shí)，很重要的指標(biāo)，那么如何更好的把混淆矩陣給打印出來(lái)呢，直接做表或者是前端可視化，小編曾經(jīng)就嘗試過(guò)用前端（D5）做出來(lái)，然后截圖，顯得不那么好看。。

代碼：

import itertools
import matplotlib.pyplot as plt
import numpy as np
 
def plot_confusion_matrix(cm, classes,
       normalize=False,
       title='Confusion matrix',
       cmap=plt.cm.Blues):
 """
 This function prints and plots the confusion matrix.
 Normalization can be applied by setting `normalize=True`.
 """
 if normalize:
  cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]
  print("Normalized confusion matrix")
 else:
  print('Confusion matrix, without normalization')
 
 print(cm)
 
 plt.imshow(cm, interpolation='nearest', cmap=cmap)
 plt.title(title)
 plt.colorbar()
 tick_marks = np.arange(len(classes))
 plt.xticks(tick_marks, classes, rotation=45)
 plt.yticks(tick_marks, classes)
 
 fmt = '.2f' if normalize else 'd'
 thresh = cm.max() / 2.
 for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
  plt.text(j, i, format(cm[i, j], fmt),
     horizontalalignment="center",
     color="white" if cm[i, j] > thresh else "black")
 
 plt.tight_layout()
 plt.ylabel('True label')
 plt.xlabel('Predicted label')
 plt.show()
 # plt.savefig('confusion_matrix',dpi=200)
 
cnf_matrix = np.array([
 [4101, 2, 5, 24, 0],
 [50, 3930, 6, 14, 5],
 [29, 3, 3973, 4, 0],
 [45, 7, 1, 3878, 119],
 [31, 1, 8, 28, 3936],
])
 
class_names = ['Buildings', 'Farmland', 'Greenbelt', 'Wasteland', 'Water']
 
# plt.figure()
# plot_confusion_matrix(cnf_matrix, classes=class_names,
#      title='Confusion matrix, without normalization')
 
# Plot normalized confusion matrix
plt.figure()
plot_confusion_matrix(cnf_matrix, classes=class_names, normalize=True,
      title='Normalized confusion matrix')

在放矩陣位置，放一下你的混淆矩陣就可以，當(dāng)然可視化混淆矩陣這一步也可以直接在模型運(yùn)行中完成。

補(bǔ)充知識(shí)：混淆矩陣(Confusion matrix)的原理及使用(scikit-learn 和 tensorflow)

原理

在機(jī)器學(xué)習(xí)中, 混淆矩陣是一個(gè)誤差矩陣, 常用來(lái)可視化地評(píng)估監(jiān)督學(xué)習(xí)算法的性能. 混淆矩陣大小為 (n_classes, n_classes) 的方陣, 其中 n_classes 表示類(lèi)的數(shù)量. 這個(gè)矩陣的每一行表示真實(shí)類(lèi)中的實(shí)例, 而每一列表示預(yù)測(cè)類(lèi)中的實(shí)例 (Tensorflow 和 scikit-learn 采用的實(shí)現(xiàn)方式). 也可以是, 每一行表示預(yù)測(cè)類(lèi)中的實(shí)例, 而每一列表示真實(shí)類(lèi)中的實(shí)例 (Confusion matrix From Wikipedia 中的定義). 通過(guò)混淆矩陣, 可以很容易看出系統(tǒng)是否會(huì)弄混兩個(gè)類(lèi), 這也是混淆矩陣名字的由來(lái).

混淆矩陣是一種特殊類(lèi)型的列聯(lián)表(contingency table)或交叉制表(cross tabulation or crosstab). 其有兩維 (真實(shí)值 "actual" 和預(yù)測(cè)值 "predicted" ), 這兩維都具有相同的類(lèi)("classes")的集合. 在列聯(lián)表中, 每個(gè)維度和類(lèi)的組合是一個(gè)變量. 列聯(lián)表以表的形式, 可視化地表示多個(gè)變量的頻率分布.

使用混淆矩陣( scikit-learn 和 Tensorflow)

下面先介紹在 scikit-learn 和 tensorflow 中計(jì)算混淆矩陣的 API (Application Programming Interface) 接口函數(shù), 然后在一個(gè)示例中, 使用這兩個(gè) API 函數(shù).

scikit-learn 混淆矩陣函數(shù) sklearn.metrics.confusion_matrix API 接口

skearn.metrics.confusion_matrix(
 y_true, # array, Gound true (correct) target values
 y_pred, # array, Estimated targets as returned by a classifier
 labels=None, # array, List of labels to index the matrix.
 sample_weight=None # array-like of shape = [n_samples], Optional sample weights
)

在 scikit-learn 中, 計(jì)算混淆矩陣用來(lái)評(píng)估分類(lèi)的準(zhǔn)確度.

按照定義, 混淆矩陣 C 中的元素 Ci,j 等于真實(shí)值為組 i , 而預(yù)測(cè)為組 j 的觀測(cè)數(shù)(the number of observations). 所以對(duì)于二分類(lèi)任務(wù), 預(yù)測(cè)結(jié)果中, 正確的負(fù)例數(shù)(true negatives, TN)為 C0,0; 錯(cuò)誤的負(fù)例數(shù)(false negatives, FN)為 C1,0; 真實(shí)的正例數(shù)為 C1,1; 錯(cuò)誤的正例數(shù)為 C0,1.

如果 labels 為 None, scikit-learn 會(huì)把在出現(xiàn)在 y_true 或 y_pred 中的所有值添加到標(biāo)記列表 labels 中, 并排好序.

Tensorflow 混淆矩陣函數(shù) tf.confusion_matrix API 接口

tf.confusion_matrix(
 labels, # 1-D Tensor of real labels for the classification task
 predictions, # 1-D Tensor of predictions for a givenclassification
 num_classes=None, # The possible number of labels the classification task can have
 dtype=tf.int32, # Data type of the confusion matrix 
 name=None, # Scope name
 weights=None, # An optional Tensor whose shape matches predictions
)

Tensorflow tf.confusion_matrix 中的 num_classes 參數(shù)的含義, 與 scikit-learn sklearn.metrics.confusion_matrix 中的 labels 參數(shù)相近, 是與標(biāo)記有關(guān)的參數(shù), 表示類(lèi)的總個(gè)數(shù), 但沒(méi)有列出具體的標(biāo)記值. 在 Tensorflow 中一般是以整數(shù)作為標(biāo)記, 如果標(biāo)記為字符串等非整數(shù)類(lèi)型, 則需先轉(zhuǎn)為整數(shù)表示. 如果 num_classes 參數(shù)為 None, 則把 labels 和 predictions 中的最大值 + 1, 作為num_classes 參數(shù)值.

tf.confusion_matrix 的 weights 參數(shù)和 sklearn.metrics.confusion_matrix 的 sample_weight 參數(shù)的含義相同, 都是對(duì)預(yù)測(cè)值進(jìn)行加權(quán), 在此基礎(chǔ)上, 計(jì)算混淆矩陣單元的值.

使用示例

#!/usr/bin/env python
# -*- coding: utf8 -*-
"""
Author: klchang
Description: 
　　A simple example for tf.confusion_matrix and sklearn.metrics.confusion_matrix.
Date: 2018.9.8
"""
from __future__ import print_function
import tensorflow as tf
import sklearn.metrics
 
y_true = [1, 2, 4]
y_pred = [2, 2, 4]
 
# Build graph with tf.confusion_matrix operation
sess = tf.InteractiveSession()
op = tf.confusion_matrix(y_true, y_pred)
op2 = tf.confusion_matrix(y_true, y_pred, num_classes=6, dtype=tf.float32, weights=tf.constant([0.3, 0.4, 0.3]))
# Execute the graph
print ("confusion matrix in tensorflow: ")
print ("1. default: \n", op.eval())
print ("2. customed: \n", sess.run(op2))
sess.close()
 
# Use sklearn.metrics.confusion_matrix function
print ("\nconfusion matrix in scikit-learn: ")
print ("1. default: \n", sklearn.metrics.confusion_matrix(y_true, y_pred))
print ("2. customed: \n", sklearn.metrics.confusion_matrix(y_true, y_pred, labels=range(6), sample_weight=[0.3, 0.4, 0.3]))

以上這篇利用python中的matplotlib打印混淆矩陣實(shí)例就是小編分享給大家的全部?jī)?nèi)容了，希望能給大家一個(gè)參考，也希望大家多多支持腳本之家。

您可能感興趣的文章: