C#機器入門學習之判斷日報是否合格詳解

更新時間：2019年03月08日 09:58:26 作者：雪雁

這篇文章主要給大家介紹了關(guān)于C#機器入門學習之判斷日報是否合格的相關(guān)資料，文中通過示例代碼介紹的非常詳細，對大家的學習或者使用c#具有一定的參考學習價值，需要的朋友們下面來一起學習學習吧

前言

簡單來說機器學習的核心步驟在于“獲取學習數(shù)據(jù)；選擇機器算法；定型模型；評估模型，預測模型結(jié)果”，下面本人就以判斷日報內(nèi)容是否合格為例為大家簡單的闡述一下C#的機器學習。

第一步：問題分析

根據(jù)需求可以得出我們的模型是以日報的內(nèi)容做為學習的特征確定的，然后通過模型判斷將該目標對象預測為是否符合標準（合格與不合格），簡單來說就是一種分類場景（此場景結(jié)果屬于二元分類，不是A就是B），那么也就確定了核心算法為分類算法當然還有其它的分類算法有興趣的可以自己去了解一下在這里就不多做說明了。

第二步：環(huán)境準備

其他的代碼編譯運行的環(huán)境并沒有太多要求，你只需要引用C#機器學習的NuGet 包，名為Microsoft.ML 具體的安裝步驟在此就不做詳細介紹了。

第三步：準備數(shù)據(jù)

這里會準備兩個數(shù)據(jù)集一個定型模型的數(shù)據(jù)集（可以稱之為學習資料）wikipedia-detox-250-line-data.tsv數(shù)據(jù)實例部分展示如下（你的數(shù)據(jù)按照這種排列格式即可該該格式的定義取決于你的輸入數(shù)據(jù)集類的結(jié)構(gòu)在下面會講到）：

Sentiment SentimentText
 第一天上班 無事
 完成了領(lǐng)導的安排任務(wù) 
 編寫了一些代碼然后寫了一些雜七雜八的文檔 
 和一般的碼農(nóng)做了一樣的事情
 和產(chǎn)品經(jīng)理一起做了一些項目上的事情 
 早上來的時候就開始討論需求，然后開始寫代碼，快下班的時候完成了整個過程的文檔分享
 ***項目的整體編排會議，設(shè)計圖的首頁以及我的個人中心制作 
 **項目需求的對接，需求的梳理，實體結(jié)構(gòu)的定義，數(shù)據(jù)庫的遷移，腦圖的完善
 1、**項目的模板消息代碼編寫，2、**項目管理后臺的模板發(fā)送完善，

定型模型數(shù)據(jù)集準備好之后還有一個評估模型的測試數(shù)據(jù)集（可以稱之為標準答案）wikipedia-detox-250-line-test.tsv格式與上面展示的評估數(shù)據(jù)集一樣

定型數(shù)據(jù)的數(shù)據(jù)越豐富算法的回歸曲線方程就會越接近理想的模型方程，你的模型預測結(jié)果就會越符合你的要求。

第四步：定義特征類

根據(jù)分享的模型確定其分析的特征項并定義為相關(guān)的類并且需要引用機器學習的包using Microsoft.ML.Data;，由此模型定義的數(shù)據(jù)集類如下（結(jié)果可看注釋）：

/// <summary>

 /// 輸入數(shù)據(jù)集類

 /// </summary>

 public class SentimentData

 {

 /// <summary>

 /// 日志是否合格的值（0：為合格，1：不合格）

 /// </summary>

 [Column(ordinal: "0", name: "Label")]

 public float Sentiment;

 

 /// <summary>

 /// 日報內(nèi)容

 /// </summary>

 [Column(ordinal: "1")]

 public string SentimentText;

 }

 

 /// <summary>

 /// 預測結(jié)果集類

 /// </summary>

 public class SentimentPrediction

 {

 /// <summary>

 /// 預測值（是否合格）

 /// </summary>

 [ColumnName("PredictedLabel")]

 public bool Prediction { get; set; }

 

 /// <summary>

 /// 或然率（結(jié)果分布概率）

 /// </summary>

 [ColumnName("Probability")]

 public float Probability { get; set; }
 }

第一個SentimentData類為輸入數(shù)據(jù)集類，指的就是根據(jù)定型的數(shù)據(jù)集的特征項定義的集類，第二個SentimentPrediction類為預測結(jié)果集類，也就是你所需要的結(jié)果的類定義該類的結(jié)構(gòu)一般受你所使用的學習算法影響，根據(jù)你的學習管道輸出的結(jié)果以及個人需求的綜合考慮來定義。輸入集類帶的Column屬性標注其在數(shù)據(jù)集的格式位置的編排以及何為Label值。預測集的PredictedLabel在預測和評估過程中使用。

第五步：代碼實現(xiàn)

首先定義以指定這些路徑和 _textLoader 變量，用來讀取數(shù)據(jù)或者是保存實驗數(shù)據(jù)，具體如下所示：

_trainDataPath 具有用于定型模型的數(shù)據(jù)集路徑。

_testDataPath 具有用于評估模型的數(shù)據(jù)集路徑。

_modelPath 具有在其中保存定型模型的路徑。

_textLoader 是用于加載和轉(zhuǎn)換數(shù)據(jù)集的 TextLoader。

然后定義程序的入口（main函數(shù)）以及相應(yīng)的處理方法：

定義SaveModelAsFile方法將模型保存為 .zip 文件代碼如下所示：

private static void SaveModelAsFile(MLContext mlContext, ITransformer model)
 {
  using (var fs = new FileStream(_modelPath, FileMode.Create, FileAccess.Write, FileShare.Write))
  mlContext.Model.Save(model, fs);
  Console.WriteLine("模型保存路徑為{0}", _modelPath);
  Console.ReadLine();
 }

定義Train方法選擇學習方法并且創(chuàng)建相應(yīng)的學習管道，輸出定型后的模型model代碼如下所示：

public static ITransformer Train(MLContext mlContext, string dataPath)
 {
  IDataView dataView = _textLoader.Read(dataPath);
  //數(shù)據(jù)特征化（按照管道所需的格式轉(zhuǎn)換數(shù)據(jù)）
  var pipeline = mlContext.Transforms.Text.FeaturizeText(inputColumnName: "SentimentText", outputColumnName: "Features")
  //根據(jù)學習算法添加學習管道
  .Append(mlContext.BinaryClassification.Trainers.FastTree(numLeaves: 50, numTrees: 50, minDatapointsInLeaves: 20));
  //得到模型
  var model = pipeline.Fit(dataView);
  Console.WriteLine();
  //返回定型模型
  return model;
 }

模型定型之后，我們需要創(chuàng)建一個方法（Evaluate）來評測該模型的質(zhì)量，根據(jù)你自己的標準測試數(shù)據(jù)集與該模型的符合程度來判斷，并且輸出相應(yīng)的指標，該指標參數(shù)根據(jù)你所調(diào)用的評估方法返回具體的根據(jù)你的算法方程返回相應(yīng)的方程的參數(shù) 。代碼如下所示：

public static void Evaluate(MLContext mlContext, ITransformer model)
 {
  var dataView = _textLoader.Read(_testDataPath);
  Console.WriteLine("===============用測試數(shù)據(jù)評估模型的準確性===============");
  var predictions = model.Transform(dataView);
  //評測定型模型的質(zhì)量
  var metrics = mlContext.BinaryClassification.Evaluate(predictions, "Label");
  Console.WriteLine();
  Console.WriteLine("模型質(zhì)量量度評估");
  Console.WriteLine("--------------------------------");
  Console.WriteLine($"精度: {metrics.Accuracy:P2}");
  Console.WriteLine($"Auc: {metrics.Auc:P2}");
  Console.WriteLine("=============== 模型結(jié)束評價 ===============");
  Console.ReadLine();
//評測完成之后開始保存定型的模型
  SaveModelAsFile(mlContext, model);
 }

定義單個數(shù)據(jù)的預測方法（Predict）與批處理預測的方法（PredictWithModelLoadedFromFile）：

單個數(shù)據(jù)集的預測代碼如下所示：

private static void Predict(MLContext mlContext, ITransformer model)
 {
 //創(chuàng)建包裝器
  var predictionFunction = model.CreatePredictionEngine<SentimentData, SentimentPrediction>(mlContext);
  SentimentData sampleStatement = new SentimentData
  {
  SentimentText = "愛車新需求開發(fā)；麥扣日志監(jiān)控部分頁面數(shù)據(jù)綁定；"
  };
//預測結(jié)果
  var resultprediction = predictionFunction.Predict(sampleStatement);
  Console.WriteLine();
  Console.WriteLine("===============單個測試數(shù)據(jù)預測 ===============");
  Console.WriteLine();
  Console.WriteLine($"日報內(nèi)容: {sampleStatement.SentimentText} | 是否合格: {(Convert.ToBoolean(resultprediction.Prediction) ? "合格" : "不合格")} | 符合率: {resultprediction.Probability} ");
  Console.WriteLine("=============== 預測結(jié)束 ===============");
  Console.WriteLine();
  Console.ReadLine();
 }

批處理數(shù)據(jù)集預測方法代碼如下所示：

public static void PredictWithModelLoadedFromFile(MLContext mlContext)
  {
   IEnumerable<SentimentData> sentiments = new[]
      {
       new SentimentData
      {
      SentimentText = "1、完成愛車年卡代碼編寫 2、與客戶完成需求對接"
      },
       new SentimentData
      {
       SentimentText = "沒有工作內(nèi)容"
      }
      };

   ITransformer loadedModel;
using (var stream = new FileStream(_modelPath, FileMode.Open, FileAccess.Read, FileShare.Read))
   {
    loadedModel = mlContext.Model.Load(stream);
   }
   // 創(chuàng)建預測（也稱之為創(chuàng)建預測房屋）   
var sentimentStreamingDataView = mlContext.Data.ReadFromEnumerable(sentiments);
   var predictions = loadedModel.Transform(sentimentStreamingDataView);
   // 使用模型預測結(jié)果值為1（不合格）還是0 （合格） 
  var predictedResults = mlContext.CreateEnumerable<SentimentPrediction>(predictions, reuseRowObject: false);
   Console.WriteLine();
   Console.WriteLine("=============== 多樣本加載模型的預測試驗 ===============");
   var sentimentsAndPredictions = sentiments.Zip(predictedResults, (sentiment, prediction) => (sentiment, prediction));
   foreach (var item in sentimentsAndPredictions)
   {
    Console.WriteLine($"日報內(nèi)容: {item.sentiment.SentimentText} | 是否合格: {(Convert.ToBoolean(item.prediction.Prediction) ? "合格" : "不合格")} | 符合率: {item.prediction.Probability} ");
   }
   Console.WriteLine("=============== 預測結(jié)束 ===============");
   Console.ReadLine();
  }

在以上的方法定義完成之后開始進行方法的調(diào)用：

public static void Main(string[] args)
  {
//創(chuàng)建一個MLContext，為ML作業(yè)提供一個上下文
   MLContext mlContext = new MLContext(seed: 0);
//初始化_textLoader以將其重復應(yīng)用于所需要的數(shù)據(jù)集
   _textLoader = mlContext.Data.CreateTextLoader(
  columns: new TextLoader.Column[]
  {
  new TextLoader.Column("Label", DataKind.Bool,0),
  new TextLoader.Column("SentimentText", DataKind.Text,1)
  },
   separatorChar: '\t',
   hasHeader: true
   );
 //定型模型
   var model = Train(mlContext, _trainDataPath);
//評測模型
   Evaluate(mlContext, model);
//單個數(shù)據(jù)預測
   Predict(mlContext, model);
   //批處理預測數(shù)據(jù)
   PredictWithModelLoadedFromFile(mlContext);
  }

準備代碼之后，你的小小的機器人就要開始學習啦，好吧開始編譯運行吧。。。。。。

運行產(chǎn)生結(jié)果為：

由于訓練的數(shù)據(jù)集特征化參數(shù)的準確性以及數(shù)據(jù)的涵蓋廣度不夠?qū)е露x的模型質(zhì)量非常的不理想因此我們可以看到我們的預測結(jié)果也是不夠符合我們的理想狀態(tài)，可見我們小機器的學習之路是非常漫長的過程啊。

由此次的機器學習的小小實踐本人也深有體會，機器就像一個小孩一樣首先你得根據(jù)他的性格（特征化參數(shù)）確定應(yīng)該給予他什么樣的學習環(huán)境（學習算法創(chuàng)建的學習管道）并提供學習資料（定型機器學習模型數(shù)據(jù)集），然后為其確定一個發(fā)展目標（評估模型數(shù)據(jù)集），并且不斷的進行考試（單個數(shù)據(jù)的預測與批量數(shù)據(jù)的預測），考試需要特定的考試場地（預測所需要調(diào)用的方法）。通過該種方式讓機器不斷的學習不斷的精進。

總結(jié)

以上就是這篇文章的全部內(nèi)容了，希望本文的內(nèi)容對大家的學習或者工作具有一定的參考學習價值，謝謝大家對腳本之家的支持。

您可能感興趣的文章: