Java實(shí)現(xiàn)的決策樹(shù)算法完整實(shí)例
本文實(shí)例講述了Java實(shí)現(xiàn)的決策樹(shù)算法。分享給大家供大家參考,具體如下:
決策樹(shù)算法是一種逼近離散函數(shù)值的方法。它是一種典型的分類方法,首先對(duì)數(shù)據(jù)進(jìn)行處理,利用歸納算法生成可讀的規(guī)則和決策樹(shù),然后使用決策對(duì)新數(shù)據(jù)進(jìn)行分析。本質(zhì)上決策樹(shù)是通過(guò)一系列規(guī)則對(duì)數(shù)據(jù)進(jìn)行分類的過(guò)程。
決策樹(shù)構(gòu)造可以分兩步進(jìn)行。第一步,決策樹(shù)的生成:由訓(xùn)練樣本集生成決策樹(shù)的過(guò)程。一般情況下,訓(xùn)練樣本數(shù)據(jù)集是根據(jù)實(shí)際需要有歷史的、有一定綜合程度的,用于數(shù)據(jù)分析處理的數(shù)據(jù)集。第二步,決策樹(shù)的剪枝:決策樹(shù)的剪枝是對(duì)上一階段生成的決策樹(shù)進(jìn)行檢驗(yàn)、校正和修下的過(guò)程,主要是用新的樣本數(shù)據(jù)集(稱為測(cè)試數(shù)據(jù)集)中的數(shù)據(jù)校驗(yàn)決策樹(shù)生成過(guò)程中產(chǎn)生的初步規(guī)則,將那些影響預(yù)衡準(zhǔn)確性的分枝剪除。
java實(shí)現(xiàn)代碼如下:
package demo; import java.util.HashMap; import java.util.LinkedList; import java.util.List; import java.util.Map; import java.util.Map.Entry; import java.util.Set; public class DicisionTree { public static void main(String[] args) throws Exception { System.out.print("腳本之家測(cè)試結(jié)果:"); String[] attrNames = new String[] { "AGE", "INCOME", "STUDENT", "CREDIT_RATING" }; // 讀取樣本集 Map<Object, List<Sample>> samples = readSamples(attrNames); // 生成決策樹(shù) Object decisionTree = generateDecisionTree(samples, attrNames); // 輸出決策樹(shù) outputDecisionTree(decisionTree, 0, null); } /** * 讀取已分類的樣本集,返回Map:分類 -> 屬于該分類的樣本的列表 */ static Map<Object, List<Sample>> readSamples(String[] attrNames) { // 樣本屬性及其所屬分類(數(shù)組中的最后一個(gè)元素為樣本所屬分類) Object[][] rawData = new Object[][] { { "<30 ", "High ", "No ", "Fair ", "0" }, { "<30 ", "High ", "No ", "Excellent", "0" }, { "30-40", "High ", "No ", "Fair ", "1" }, { ">40 ", "Medium", "No ", "Fair ", "1" }, { ">40 ", "Low ", "Yes", "Fair ", "1" }, { ">40 ", "Low ", "Yes", "Excellent", "0" }, { "30-40", "Low ", "Yes", "Excellent", "1" }, { "<30 ", "Medium", "No ", "Fair ", "0" }, { "<30 ", "Low ", "Yes", "Fair ", "1" }, { ">40 ", "Medium", "Yes", "Fair ", "1" }, { "<30 ", "Medium", "Yes", "Excellent", "1" }, { "30-40", "Medium", "No ", "Excellent", "1" }, { "30-40", "High ", "Yes", "Fair ", "1" }, { ">40 ", "Medium", "No ", "Excellent", "0" } }; // 讀取樣本屬性及其所屬分類,構(gòu)造表示樣本的Sample對(duì)象,并按分類劃分樣本集 Map<Object, List<Sample>> ret = new HashMap<Object, List<Sample>>(); for (Object[] row : rawData) { Sample sample = new Sample(); int i = 0; for (int n = row.length - 1; i < n; i++) sample.setAttribute(attrNames[i], row[i]); sample.setCategory(row[i]); List<Sample> samples = ret.get(row[i]); if (samples == null) { samples = new LinkedList<Sample>(); ret.put(row[i], samples); } samples.add(sample); } return ret; } /** * 構(gòu)造決策樹(shù) */ static Object generateDecisionTree( Map<Object, List<Sample>> categoryToSamples, String[] attrNames) { // 如果只有一個(gè)樣本,將該樣本所屬分類作為新樣本的分類 if (categoryToSamples.size() == 1) return categoryToSamples.keySet().iterator().next(); // 如果沒(méi)有供決策的屬性,則將樣本集中具有最多樣本的分類作為新樣本的分類,即投票選舉出分類 if (attrNames.length == 0) { int max = 0; Object maxCategory = null; for (Entry<Object, List<Sample>> entry : categoryToSamples .entrySet()) { int cur = entry.getValue().size(); if (cur > max) { max = cur; maxCategory = entry.getKey(); } } return maxCategory; } // 選取測(cè)試屬性 Object[] rst = chooseBestTestAttribute(categoryToSamples, attrNames); // 決策樹(shù)根結(jié)點(diǎn),分支屬性為選取的測(cè)試屬性 Tree tree = new Tree(attrNames[(Integer) rst[0]]); // 已用過(guò)的測(cè)試屬性不應(yīng)再次被選為測(cè)試屬性 String[] subA = new String[attrNames.length - 1]; for (int i = 0, j = 0; i < attrNames.length; i++) if (i != (Integer) rst[0]) subA[j++] = attrNames[i]; // 根據(jù)分支屬性生成分支 @SuppressWarnings("unchecked") Map<Object, Map<Object, List<Sample>>> splits = /* NEW LINE */(Map<Object, Map<Object, List<Sample>>>) rst[2]; for (Entry<Object, Map<Object, List<Sample>>> entry : splits.entrySet()) { Object attrValue = entry.getKey(); Map<Object, List<Sample>> split = entry.getValue(); Object child = generateDecisionTree(split, subA); tree.setChild(attrValue, child); } return tree; } /** * 選取最優(yōu)測(cè)試屬性。最優(yōu)是指如果根據(jù)選取的測(cè)試屬性分支,則從各分支確定新樣本 * 的分類需要的信息量之和最小,這等價(jià)于確定新樣本的測(cè)試屬性獲得的信息增益最大 * 返回?cái)?shù)組:選取的屬性下標(biāo)、信息量之和、Map(屬性值->(分類->樣本列表)) */ static Object[] chooseBestTestAttribute( Map<Object, List<Sample>> categoryToSamples, String[] attrNames) { int minIndex = -1; // 最優(yōu)屬性下標(biāo) double minValue = Double.MAX_VALUE; // 最小信息量 Map<Object, Map<Object, List<Sample>>> minSplits = null; // 最優(yōu)分支方案 // 對(duì)每一個(gè)屬性,計(jì)算將其作為測(cè)試屬性的情況下在各分支確定新樣本的分類需要的信息量之和,選取最小為最優(yōu) for (int attrIndex = 0; attrIndex < attrNames.length; attrIndex++) { int allCount = 0; // 統(tǒng)計(jì)樣本總數(shù)的計(jì)數(shù)器 // 按當(dāng)前屬性構(gòu)建Map:屬性值->(分類->樣本列表) Map<Object, Map<Object, List<Sample>>> curSplits = /* NEW LINE */new HashMap<Object, Map<Object, List<Sample>>>(); for (Entry<Object, List<Sample>> entry : categoryToSamples .entrySet()) { Object category = entry.getKey(); List<Sample> samples = entry.getValue(); for (Sample sample : samples) { Object attrValue = sample .getAttribute(attrNames[attrIndex]); Map<Object, List<Sample>> split = curSplits.get(attrValue); if (split == null) { split = new HashMap<Object, List<Sample>>(); curSplits.put(attrValue, split); } List<Sample> splitSamples = split.get(category); if (splitSamples == null) { splitSamples = new LinkedList<Sample>(); split.put(category, splitSamples); } splitSamples.add(sample); } allCount += samples.size(); } // 計(jì)算將當(dāng)前屬性作為測(cè)試屬性的情況下在各分支確定新樣本的分類需要的信息量之和 double curValue = 0.0; // 計(jì)數(shù)器:累加各分支 for (Map<Object, List<Sample>> splits : curSplits.values()) { double perSplitCount = 0; for (List<Sample> list : splits.values()) perSplitCount += list.size(); // 累計(jì)當(dāng)前分支樣本數(shù) double perSplitValue = 0.0; // 計(jì)數(shù)器:當(dāng)前分支 for (List<Sample> list : splits.values()) { double p = list.size() / perSplitCount; perSplitValue -= p * (Math.log(p) / Math.log(2)); } curValue += (perSplitCount / allCount) * perSplitValue; } // 選取最小為最優(yōu) if (minValue > curValue) { minIndex = attrIndex; minValue = curValue; minSplits = curSplits; } } return new Object[] { minIndex, minValue, minSplits }; } /** * 將決策樹(shù)輸出到標(biāo)準(zhǔn)輸出 */ static void outputDecisionTree(Object obj, int level, Object from) { for (int i = 0; i < level; i++) System.out.print("|-----"); if (from != null) System.out.printf("(%s):", from); if (obj instanceof Tree) { Tree tree = (Tree) obj; String attrName = tree.getAttribute(); System.out.printf("[%s = ?]\n", attrName); for (Object attrValue : tree.getAttributeValues()) { Object child = tree.getChild(attrValue); outputDecisionTree(child, level + 1, attrName + " = " + attrValue); } } else { System.out.printf("[CATEGORY = %s]\n", obj); } } /** * 樣本,包含多個(gè)屬性和一個(gè)指明樣本所屬分類的分類值 */ static class Sample { private Map<String, Object> attributes = new HashMap<String, Object>(); private Object category; public Object getAttribute(String name) { return attributes.get(name); } public void setAttribute(String name, Object value) { attributes.put(name, value); } public Object getCategory() { return category; } public void setCategory(Object category) { this.category = category; } public String toString() { return attributes.toString(); } } /** * 決策樹(shù)(非葉結(jié)點(diǎn)),決策樹(shù)中的每個(gè)非葉結(jié)點(diǎn)都引導(dǎo)了一棵決策樹(shù) * 每個(gè)非葉結(jié)點(diǎn)包含一個(gè)分支屬性和多個(gè)分支,分支屬性的每個(gè)值對(duì)應(yīng)一個(gè)分支,該分支引導(dǎo)了一棵子決策樹(shù) */ static class Tree { private String attribute; private Map<Object, Object> children = new HashMap<Object, Object>(); public Tree(String attribute) { this.attribute = attribute; } public String getAttribute() { return attribute; } public Object getChild(Object attrValue) { return children.get(attrValue); } public void setChild(Object attrValue, Object child) { children.put(attrValue, child); } public Set<Object> getAttributeValues() { return children.keySet(); } } }
運(yùn)行結(jié)果:
更多關(guān)于java算法相關(guān)內(nèi)容感興趣的讀者可查看本站專題:《Java數(shù)據(jù)結(jié)構(gòu)與算法教程》、《Java操作DOM節(jié)點(diǎn)技巧總結(jié)》、《Java文件與目錄操作技巧匯總》和《Java緩存操作技巧匯總》
希望本文所述對(duì)大家java程序設(shè)計(jì)有所幫助。
- 如何實(shí)現(xiàn)java遞歸 處理權(quán)限管理菜單樹(shù)或分類
- Java遞歸遍歷樹(shù)形結(jié)構(gòu)的實(shí)現(xiàn)代碼
- java實(shí)現(xiàn)構(gòu)造無(wú)限層級(jí)樹(shù)形菜單
- java 實(shí)現(xiàn)簡(jiǎn)單圣誕樹(shù)的示例代碼(圣誕節(jié)快樂(lè))
- Java Swing中的表格(JTable)和樹(shù)(JTree)組件使用實(shí)例
- Java構(gòu)建樹(shù)形菜單的實(shí)例代碼(支持多級(jí)菜單)
- Java Swing樹(shù)狀組件JTree用法實(shí)例詳解
- Java遍歷輸出指定目錄、樹(shù)形結(jié)構(gòu)所有文件包括子目錄下的文件
- JSON復(fù)雜數(shù)據(jù)處理之Json樹(shù)形結(jié)構(gòu)數(shù)據(jù)轉(zhuǎn)Java對(duì)象并存儲(chǔ)到數(shù)據(jù)庫(kù)的實(shí)現(xiàn)
- java分類樹(shù),我從2s優(yōu)化到0.1s
相關(guān)文章
Java System.currentTimeMillis()時(shí)間的單位轉(zhuǎn)換與計(jì)算方式案例詳解
這篇文章主要介紹了Java System.currentTimeMillis()時(shí)間的單位轉(zhuǎn)換與計(jì)算方式案例詳解,本篇文章通過(guò)簡(jiǎn)要的案例,講解了該項(xiàng)技術(shù)的了解與使用,以下就是詳細(xì)內(nèi)容,需要的朋友可以參考下2021-08-08如何使用spring ResponseEntity處理http響應(yīng)
這篇文章主要介紹了如何使用spring ResponseEntity處理http響應(yīng)的操作,具有很好的參考價(jià)值,希望對(duì)大家有所幫助。如有錯(cuò)誤或未考慮完全的地方,望不吝賜教2021-07-07dubbo如何實(shí)現(xiàn)consumer從多個(gè)group中調(diào)用指定group的provider
這篇文章主要介紹了dubbo如何實(shí)現(xiàn)consumer從多個(gè)group中調(diào)用指定group的provider問(wèn)題,具有很好的參考價(jià)值,希望對(duì)大家有所幫助。如有錯(cuò)誤或未考慮完全的地方,望不吝賜教2023-03-03Java經(jīng)緯度小數(shù)與度分秒相互轉(zhuǎn)換工具類示例詳解
這篇文章主要介紹了Java經(jīng)緯度小數(shù)與度分秒相互轉(zhuǎn)換工具類,本文通過(guò)示例代碼給大家介紹的非常詳細(xì),對(duì)大家的學(xué)習(xí)或工作具有一定的參考借鑒價(jià)值,需要的朋友可以參考下2023-07-07java awt實(shí)現(xiàn)計(jì)算器功能
這篇文章主要為大家詳細(xì)介紹了java awt實(shí)現(xiàn)計(jì)算器功能,文中示例代碼介紹的非常詳細(xì),具有一定的參考價(jià)值,感興趣的小伙伴們可以參考一下2020-12-12基于redis setIfAbsent的使用說(shuō)明
這篇文章主要介紹了基于redis setIfAbsent的使用說(shuō)明,具有很好的參考價(jià)值,希望對(duì)大家有所幫助。一起跟隨小編過(guò)來(lái)看看吧2021-01-01