快捷導(dǎo)航

JAVA使用前綴樹(Tire樹)實現(xiàn)敏感詞過濾、詞典搜索

更新時間：2023年01月03日 10:31:36 作者：萌萌噠二狗子

本文主要介紹了JAVA使用前綴樹(Tire樹)實現(xiàn)敏感詞過濾、詞典搜索，文中通過示例代碼介紹的非常詳細(xì)，對大家的學(xué)習(xí)或者工作具有一定的參考學(xué)習(xí)價值，需要的朋友們下面隨著小編來一起學(xué)習(xí)學(xué)習(xí)吧

簡介

有時候需要對用戶輸入的內(nèi)容進行敏感詞過濾，或者實現(xiàn)查找文本中出現(xiàn)的詞典中的詞，用遍歷的方式進行替換或者查找效率非常低，這里提供一個基于Trie樹的方式，進行關(guān)鍵詞的查找與過濾，在詞典比較大的情況下效率非常高。

Trie樹

Trie樹，又叫前綴樹，多說無益，直接看圖就明白了

詞典：[“豬狗”, “小狗”, “小貓”, “小豬”, “垃圾”, “狗東西”]

Tire數(shù)據(jù)結(jié)構(gòu)：

在這里插入圖片描述

code

樹節(jié)點Node.class

/**
 * trie tree
 *
 * @author lovely dog
 * @date 2020/10/20
 */
public class Node {
    /**
     * 子節(jié)點
     */
    private Map<Character, Node> nextNodes = new HashMap<>();

    public void addNext(Character key, Node node){
        nextNodes.put(key, node);
    }

    public Node getNext(Character key){
        return nextNodes.get(key);
    }

    public boolean isLastCharacter(){
        return nextNodes.isEmpty();
    }
}

搜索類TrieSearcher.class

/**
 * trie tree searcher
 *
 * @author lovely dog
 * @date 2020/10/20
 */
public class TrieSearcher {

    private Node root = new Node();

    /**
     * 添加詞
     *
     * @param word 詞
     */
    public void addWord(String word) {
        Node tmpNode = root;
        for (char c : word.toCharArray()) {
            Node node = tmpNode.getNext(c);
            if (null == node) {
                node = new Node();
                tmpNode.addNext(c, node);
            }
            tmpNode = node;
        }
    }

    /**
     * 替換詞
     *
     * @param text         待處理文本
     * @param afterReplace 替換后的詞
     * @return 處理后的文本
     */
    public String replace(String text, String afterReplace) {
        StringBuilder result = new StringBuilder(text.length());
        Node tmpNode = root;
        int begin = 0, pos = 0;
        while (pos < text.length()) {
            char c = text.charAt(pos);
            tmpNode = tmpNode.getNext(c);
            if (null == tmpNode) {
                result.append(text.charAt(begin));
                begin++;
                pos = begin;
                tmpNode = root;
            } else if (tmpNode.isLastCharacter()) {
                // 匹配完成, 進行替換
                result.append(afterReplace);
                pos++;
                begin = pos;
                tmpNode = root;
            } else {
                // 匹配上向后移
                pos++;
            }
        }
        result.append(text.substring(begin));
        return result.toString();
    }

    /**
     * 查找
     *
     * @param text 待處理文本
     * @return 統(tǒng)計數(shù)據(jù) key: word value: count
     */
    public Map<String, Integer> find(String text) {
        Map<String, Integer> resultMap = new HashMap<>(16);
        Node tmpNode = root;
        StringBuilder word = new StringBuilder();
        int begin = 0, pos = 0;
        while (pos < text.length()) {
            char c = text.charAt(pos);
            tmpNode = tmpNode.getNext(c);
            if (null == tmpNode) {
                begin++;
                pos = begin;
                tmpNode = root;
            } else if (tmpNode.isLastCharacter()) {
                // 匹配完成
                String w = word.append(c).toString();
                resultMap.put(w, resultMap.getOrDefault(w, 0) + 1);
                pos++;
                begin = pos;
                tmpNode = root;
                word = new StringBuilder();
            } else {
                // 匹配上向后移
                word.append(c);
                pos++;
            }
        }
        return resultMap;
    }
}

測試Main.class

public class Main {
    public static void main(String[] args) {
        TrieSearcher trieSearcher = new TrieSearcher();
        Stream.of("豬狗", "小狗", "小貓", "小豬", "垃圾", "狗東西").forEach(trieSearcher::addWord);
        String sentence = "你好，小狗，小豬，今天天氣真好。";
        System.out.println(trieSearcher.replace(sentence, "***"));
        System.out.println(trieSearcher.find(sentence));
    }
}

輸出：