欧美bbbwbbbw肥妇,免费乱码人妻系列日韩,一级黄片

基于MapReduce實(shí)現(xiàn)決策樹算法

 更新時(shí)間:2019年08月10日 11:27:12   作者:KevinYunhe  
這篇文章主要為大家詳細(xì)介紹了基于MapReduce實(shí)現(xiàn)決策樹算法,具有一定的參考價(jià)值,感興趣的小伙伴們可以參考一下

本文實(shí)例為大家分享了MapReduce實(shí)現(xiàn)決策樹算法的具體代碼,供大家參考,具體內(nèi)容如下

首先,基于C45決策樹算法實(shí)現(xiàn)對(duì)應(yīng)的Mapper算子,相關(guān)的代碼如下:

public class MapClass extends MapReduceBase implements Mapper {
 
  private final static IntWritable one = new IntWritable(1);
  private Text attValue = new Text();
  private int i;
  private String token;
  public static int no_Attr;
  public Split split = null;
  
  public int size_split_1 = 0;
  
  public void configure(JobConf conf){
   try {
  split = (Split) ObjectSerializable.unSerialize(conf.get("currentsplit"));
 } catch (ClassNotFoundException e) {
  // TODO Auto-generated catch block
  e.printStackTrace();
 } catch (IOException e) {
  // TODO Auto-generated catch block
  e.printStackTrace();
 }
   size_split_1 = Integer.parseInt(conf.get("current_index"));
  }
  
  public void map(LongWritable key, Text value, OutputCollector output, Reporter reporter)
      throws IOException {
    String line = value.toString(); // changing input instance value to
                    // string
    StringTokenizer itr = new StringTokenizer(line);
    int index = 0;
    String attr_value = null;
    no_Attr = itr.countTokens() - 1;
    String attr[] = new String[no_Attr];
    boolean match = true;
    for (i = 0; i < no_Attr; i++) {
      attr[i] = itr.nextToken(); // Finding the values of different
                    // attributes
    }
 
    String classLabel = itr.nextToken();
    int size_split = split.attr_index.size();
    Counter counter = reporter.getCounter("reporter-"+Main.current_index, size_split+" "+size_split_1);
    counter.increment(1l);
    for (int count = 0; count < size_split; count++) {
      index = (Integer) split.attr_index.get(count);
      attr_value = (String) split.attr_value.get(count);
      if (!attr[index].equals(attr_value)) {
        match = false;
        break;
      }
    }
 
    if (match) {
      for (int l = 0; l < no_Attr; l++) {
        if (!split.attr_index.contains(l)) {
         //表示出某個(gè)屬性在某個(gè)類標(biāo)簽上出現(xiàn)了一次
          token = l + " " + attr[l] + " " + classLabel;
          attValue.set(token);
          output.collect(attValue, one);
        }
        else{
         
        }
      }
      if (size_split == no_Attr) {
        token = no_Attr + " " + "null" + " " + classLabel;
        attValue.set(token);
        output.collect(attValue, one);
      }
    }
  }
 
}

然后,基于C45決策樹算法實(shí)現(xiàn)對(duì)應(yīng)的Reducer算子,相關(guān)的代碼如下:

public class Reduce extends MapReduceBase implements Reducer {
 
  static int cnt = 0;
  ArrayList ar = new ArrayList();
  String data = null;
  private static int currentIndex;
 
  public void configure(JobConf conf) {
    currentIndex = Integer.valueOf(conf.get("currentIndex"));
  }
 
  public void reduce(Text key, Iterator values, OutputCollector output,
      Reporter reporter) throws IOException {
    int sum = 0;
    //sum表示按照某個(gè)屬性進(jìn)行劃分的子數(shù)據(jù)集上的某個(gè)類出現(xiàn)的個(gè)數(shù)
    while (values.hasNext()) {
      sum += values.next().get();
    }
    //最后將這個(gè)屬性上的取值寫入output中;
    output.collect(key, new IntWritable(sum));
 
    String data = key + " " + sum;
    ar.add(data);
    //將最終結(jié)果寫入到文件中;
    writeToFile(ar);
    ar.add("\n");
  }
 
  public static void writeToFile(ArrayList text) {
    try {
      cnt++;
      Path input = new Path("C45/intermediate" + currentIndex + ".txt");
      Configuration conf = new Configuration();
      FileSystem fs = FileSystem.get(conf);
      BufferedWriter bw = new BufferedWriter(new OutputStreamWriter(fs.create(input, true)));
 
      for (String str : text) {
        bw.write(str);
      }
      bw.newLine();
      bw.close();
    } catch (Exception e) {
      System.out.println("File is not creating in reduce");
    }
  }
}

最后,編寫Main函數(shù),啟動(dòng)MapReduce作業(yè),需要啟動(dòng)多趟,代碼如下:

package com.hackecho.hadoop;
 
import java.io.BufferedWriter;
import java.io.IOException;
import java.io.OutputStreamWriter;
import java.util.ArrayList;
import java.util.List;
import java.util.StringTokenizer;
 
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
import org.apache.log4j.PropertyConfigurator;
import org.dmg.pmml.MiningFunctionType;
import org.dmg.pmml.Node;
import org.dmg.pmml.PMML;
import org.dmg.pmml.TreeModel;
 
//在這里MapReduce的作用就是根據(jù)各個(gè)屬性的特征來劃分子數(shù)據(jù)集
public class Main extends Configured implements Tool {
 
 //當(dāng)前分裂
  public static Split currentsplit = new Split();
  //已經(jīng)分裂完成的集合
  public static List splitted = new ArrayList();
  //current_index 表示目前進(jìn)行分裂的位置
  public static int current_index = 0;
  
  public static ArrayList ar = new ArrayList();
  
  public static List leafSplits = new ArrayList();
  
  public static final String PROJECT_HOME = System.getProperty("user.dir");
 
  public static void main(String[] args) throws Exception {
   //在splitted中已經(jīng)放入了一個(gè)currentsplit了,所以此時(shí)的splitted的size大小為1
   PropertyConfigurator.configure(PROJECT_HOME + "/conf/log/log4j.properties");
    splitted.add(currentsplit);
   
    Path c45 = new Path("C45");
    Configuration conf = new Configuration();
    FileSystem fs = FileSystem.get(conf);
    if (fs.exists(c45)) {
      fs.delete(c45, true);
    }
    fs.mkdirs(c45);
    int res = 0;
    int split_index = 0;
    //增益率
    double gainratio = 0;
    //最佳增益
    double best_gainratio = 0;
    //熵值
    double entropy = 0;
    //分類標(biāo)簽
    String classLabel = null;
    //屬性個(gè)數(shù)
    int total_attributes = MapClass.no_Attr;
    total_attributes = 4;
    //分裂的個(gè)數(shù)
    int split_size = splitted.size();
    //增益率
    GainRatio gainObj;
    //產(chǎn)生分裂的新節(jié)點(diǎn)
    Split newnode;
 
    while (split_size > current_index) {
     currentsplit = splitted.get(current_index);
      gainObj = new GainRatio();
      res = ToolRunner.run(new Configuration(), new Main(), args);
      System.out.println("Current NODE INDEX . ::" + current_index);
      int j = 0;
      int temp_size;
      gainObj.getcount();
      //計(jì)算當(dāng)前節(jié)點(diǎn)的信息熵
      entropy = gainObj.currNodeEntophy();
      //獲取在當(dāng)前節(jié)點(diǎn)的分類
      classLabel = gainObj.majorityLabel();
      currentsplit.classLabel = classLabel;
 
      if (entropy != 0.0 && currentsplit.attr_index.size() != total_attributes) {
        System.out.println("");
        System.out.println("Entropy NOTT zero  SPLIT INDEX::  " + entropy);
        best_gainratio = 0;
        //計(jì)算各個(gè)屬性的信息增益值
        for (j = 0; j < total_attributes; j++) // Finding the gain of
                            // each attribute
        {
          if (!currentsplit.attr_index.contains(j)) {
           //按照每一個(gè)屬性的序號(hào),也就是索引j來計(jì)算各個(gè)屬性的信息增益
            gainratio = gainObj.gainratio(j, entropy);
            //找出最佳的信息增益
            if (gainratio >= best_gainratio) {
              split_index = j;
              best_gainratio = gainratio;
            }
          }
        }
 
        //split_index表示在第幾個(gè)屬性上完成了分裂,也就是分裂的索引值;
        //attr_values_split表示分裂的屬性所取的值的拼接成的字符串;
        String attr_values_split = gainObj.getvalues(split_index);
        StringTokenizer attrs = new StringTokenizer(attr_values_split);
        int number_splits = attrs.countTokens(); // number of splits
                             // possible with
                             // attribute selected
        String red = "";
        System.out.println(" INDEX :: " + split_index);
        System.out.println(" SPLITTING VALUES " + attr_values_split);
 
        //根據(jù)分裂形成的屬性值的集合將在某個(gè)節(jié)點(diǎn)上按照屬性值將數(shù)據(jù)集分成若干類
        for (int splitnumber = 1; splitnumber <= number_splits; splitnumber++) {
          temp_size = currentsplit.attr_index.size();
          newnode = new Split();
          for (int y = 0; y < temp_size; y++) {
            newnode.attr_index.add(currentsplit.attr_index.get(y));
            newnode.attr_value.add(currentsplit.attr_value.get(y));
          }
          red = attrs.nextToken();
 
          newnode.attr_index.add(split_index);
          newnode.attr_value.add(red);
          //按照當(dāng)前的屬性值將數(shù)據(jù)集將若干分類,同時(shí)將數(shù)據(jù)集按照這個(gè)屬性劃分位若干個(gè)新的分裂;
          splitted.add(newnode);
        }
      } else if(entropy==0.0 && currentsplit.attr_index.size()!=total_attributes){
       //每次計(jì)算到葉子節(jié)點(diǎn)的時(shí)候,就將其持久化到模型文件中
       /**
        String rule = "";
        temp_size = currentsplit.attr_index.size();
        for (int val = 0; val < temp_size; val++) {
          rule = rule + " " + currentsplit.attr_index.get(val) + " " + currentsplit.attr_value.get(val);
        }
        rule = rule + " " + currentsplit.classLabel;
        ar.add(rule);
        writeRuleToFile(ar);
        ar.add("\n");
        if (entropy != 0.0) {
          System.out.println("Enter rule in file:: " + rule);
        } else {
          System.out.println("Enter rule in file Entropy zero ::  " + rule);
        }
        System.out.println("persistence model@!!!!");
        */
       leafSplits.add(currentsplit);
      }
      else{
       TreeModel tree = PmmlDecisionTree.buildTreeModel(leafSplits);
       PMML pmml = new PMML();
       pmml.addModels(tree);
       PmmlModelFactory.pmmlPersistence("C45/DecisionTree.pmml", pmml);
      }
      split_size = splitted.size();
      System.out.println("TOTAL NODES::  " + split_size);
      current_index++;
    }
    System.out.println("Done!");
    System.exit(res);
  }
 
  public static void writeRuleToFile(ArrayList text) throws IOException {
   Path rule = new Path("C45/rule.txt");
    Configuration conf = new Configuration();
    FileSystem fs = FileSystem.get(conf);
    try {
      BufferedWriter bw = new BufferedWriter(new OutputStreamWriter(fs.create(rule, true)));
      for (String str : text) {
        bw.write(str);
      }
      bw.newLine();
      bw.close();
    } catch (Exception e) {
      e.printStackTrace();
    }
  }
 
  public int run(String[] args) throws Exception {
    System.out.println("In main ---- run");
    JobConf conf = new JobConf(getConf(), Main.class);
    conf.setJobName("C45");
    conf.set("currentsplit",ObjectSerializable.serialize(currentsplit));
    conf.set("current_index",String.valueOf(currentsplit.attr_index.size()));
    conf.set("currentIndex", String.valueOf(current_index));
 
    // the keys are words (strings)
    conf.setOutputKeyClass(Text.class);
    // the values are counts (ints)
    conf.setOutputValueClass(IntWritable.class);
 
    conf.setMapperClass(MapClass.class);
    conf.setReducerClass(Reduce.class);
    System.out.println("back to run");
 
    FileSystem fs = FileSystem.get(conf);
 
    Path out = new Path(args[1] + current_index);
    if (fs.exists(out)) {
      fs.delete(out, true);
    }
    FileInputFormat.setInputPaths(conf, args[0]);
    FileOutputFormat.setOutputPath(conf, out);
 
    JobClient.runJob(conf);
    return 0;
  }
}

以上就是本文的全部內(nèi)容,希望對(duì)大家的學(xué)習(xí)有所幫助,也希望大家多多支持腳本之家。

相關(guān)文章

  • Maven本地打包war包實(shí)現(xiàn)代碼解析

    Maven本地打包war包實(shí)現(xiàn)代碼解析

    這篇文章主要介紹了Maven本地打包war包實(shí)現(xiàn)代碼解析,文中通過示例代碼介紹的非常詳細(xì),對(duì)大家的學(xué)習(xí)或者工作具有一定的參考學(xué)習(xí)價(jià)值,需要的朋友可以參考下
    2020-09-09
  • JAVA日期處理類詳解

    JAVA日期處理類詳解

    這篇文章主要介紹了Java實(shí)現(xiàn)的日期處理類,結(jié)合完整實(shí)例形式分析了Java針對(duì)日期的獲取、運(yùn)算、轉(zhuǎn)換等相關(guān)操作技巧,需要的朋友可以參考下
    2021-08-08
  • java發(fā)送短信的實(shí)現(xiàn)步驟

    java發(fā)送短信的實(shí)現(xiàn)步驟

    下面小編就為大家?guī)硪黄猨ava發(fā)送短信的實(shí)現(xiàn)步驟。小編覺得挺不錯(cuò)的,現(xiàn)在就分享給大家,也給大家做個(gè)參考。一起跟隨小編過來看看吧
    2017-09-09
  • java增強(qiáng)for循環(huán)的實(shí)現(xiàn)方法

    java增強(qiáng)for循環(huán)的實(shí)現(xiàn)方法

    下面小編就為大家?guī)硪黄猨ava增強(qiáng)for循環(huán)的實(shí)現(xiàn)方法。小編覺得挺不錯(cuò)的,現(xiàn)在就分享給大家,也給大家做個(gè)參考。一起跟隨小編過來看看吧
    2016-09-09
  • SpringMVC響應(yīng)視圖和結(jié)果視圖詳解

    SpringMVC響應(yīng)視圖和結(jié)果視圖詳解

    這篇文章主要介紹了SpringMVC響應(yīng)視圖和結(jié)果視圖,具有很好的參考價(jià)值,希望對(duì)大家有所幫助。如有錯(cuò)誤或未考慮完全的地方,望不吝賜教
    2021-09-09
  • 解決Spring boot 整合Junit遇到的坑

    解決Spring boot 整合Junit遇到的坑

    這篇文章主要介紹了解決Spring boot 整合Junit遇到的坑,具有很好的參考價(jià)值,希望對(duì)大家有所幫助。如有錯(cuò)誤或未考慮完全的地方,望不吝賜教
    2021-09-09
  • SpringBoot 使用Mongo的GridFs實(shí)現(xiàn)分布式文件存儲(chǔ)操作

    SpringBoot 使用Mongo的GridFs實(shí)現(xiàn)分布式文件存儲(chǔ)操作

    這篇文章主要介紹了Spring Boot 使用Mongo的GridFs實(shí)現(xiàn)分布式文件存儲(chǔ)操作,具有很好的參考價(jià)值,希望對(duì)大家有所幫助。如有錯(cuò)誤或未考慮完全的地方,望不吝賜教
    2021-10-10
  • scala 匿名函數(shù)案例詳解

    scala 匿名函數(shù)案例詳解

    Scala支持一級(jí)函數(shù),函數(shù)可以用函數(shù)文字語法表達(dá),即(x:Int)=> x + 1,該函數(shù)可以由一個(gè)叫作函數(shù)值的對(duì)象來表示,這篇文章主要介紹了scala 匿名函數(shù)詳解,需要的朋友可以參考下
    2023-03-03
  • SpringBoot實(shí)現(xiàn)國密SM4加密解密的使用示例

    SpringBoot實(shí)現(xiàn)國密SM4加密解密的使用示例

    在商用密碼體系中,SM4主要用于數(shù)據(jù)加密,本文就來介紹一下SpringBoot實(shí)現(xiàn)國密SM4加密解密的使用示例,具有一定的參考價(jià)值,感興趣的可以了解一下
    2023-10-10
  • springboot配置文件中屬性變量引用方式@@解讀

    springboot配置文件中屬性變量引用方式@@解讀

    這篇文章主要介紹了springboot配置文件中屬性變量引用方式@@解讀,具有很好的參考價(jià)值,希望對(duì)大家有所幫助。如有錯(cuò)誤或未考慮完全的地方,望不吝賜教
    2023-04-04

最新評(píng)論