SpringBoot使用SensitiveWord實現(xiàn)敏感詞過濾

更新時間：2023年01月14日 09:01:27 作者：墨水記憶

這篇文章主要為大家詳細介紹了SpringBoot如何使用SensitiveWord實現(xiàn)敏感詞過濾功能，文中示例代碼講解詳細，感興趣的小伙伴可以了解一下

導入依賴

<dependency>
  <groupId>com.github.houbb</groupId>
  <artifactId>sensitive-word</artifactId>
  <version>0.2.0</version>
</dependency>

Github地址

方法

方法	參數(shù)	返回值	說明
contains(String)	待驗證的字符串	布爾值	驗證字符串是否包含敏感詞
replace(String, ISensitiveWordReplace)	使用指定的替換策略替換敏感詞	字符串	返回脫敏后的字符串
replace(String, char)	使用指定的 char 替換敏感詞	字符串	返回脫敏后的字符串
replace(String)	使用 * 替換敏感詞	字符串	返回脫敏后的字符串
findAll(String)	待驗證的字符串	字符串列表	返回字符串中所有敏感詞
findFirst(String)	待驗證的字符串	字符串	返回字符串中第一個敏感詞
findAll(String, IWordResultHandler)	IWordResultHandler 結(jié)果處理類	字符串列表	返回字符串中所有敏感詞
findFirst(String, IWordResultHandler)	IWordResultHandler 結(jié)果處理類	字符串	返回字符串中第一個敏感詞

ISensitiveWordReplace：敏感詞替換策略。

IWordResultHandler：結(jié)果處理?？梢詫γ舾性~的結(jié)果進行處理，允許用戶自定義。內(nèi)置了WordResultHandlers 工具類。

WordResultHandlers.word()：只保留敏感詞單詞本身。
WordResultHandlers.raw()：保留敏感詞相關(guān)信息，包含敏感詞以及敏感詞對應的開始和結(jié)束下標。

默認示例

使用默認提供的方法。

@Test
void testWord() {
  String text = "紅旗迎風飄揚，主席的畫像屹立在天安門前。";
  System.out.println(SensitiveWordHelper.contains(text));

  System.out.println(SensitiveWordHelper.replace(text));
  System.out.println(SensitiveWordHelper.replace(text, '0'));

  System.out.println(SensitiveWordHelper.findFirst(text));
  System.out.println(SensitiveWordHelper.findFirst(text, WordResultHandlers.word()));
  System.out.println(SensitiveWordHelper.findFirst(text, WordResultHandlers.raw()));

  System.out.println(SensitiveWordHelper.findAll(text));
  System.out.println(SensitiveWordHelper.findAll(text, WordResultHandlers.word()));
  System.out.println(SensitiveWordHelper.findAll(text, WordResultHandlers.raw()));

}

輸出：

Init sensitive word map end! Cost time: 163ms
true
****迎風飄揚，***的畫像屹立在***前。
0000迎風飄揚，000的畫像屹立在000前。
紅旗
紅旗
WordResult{word='紅旗', startIndex=0, endIndex=4}
[紅旗, 主席, 天安門]
[紅旗, 主席, 天安門]
[WordResult{word='紅旗', startIndex=0, endIndex=4}, WordResult{word='主席', startIndex=9, endIndex=12}, WordResult{word='天安門', startIndex=18, endIndex=21}]

自定義替換策略示例

采用自定義的替換策略實現(xiàn)。首先需要實現(xiàn) ISensitiveWordReplace接口自定義替換策略：

package com.tothefor;

import com.github.houbb.heaven.util.lang.CharUtil;
import com.github.houbb.sensitive.word.api.ISensitiveWordReplace;
import com.github.houbb.sensitive.word.api.ISensitiveWordReplaceContext;

public class MySensitiveWordReplace implements ISensitiveWordReplace {

  @Override
  public String replace(ISensitiveWordReplaceContext context) {
    String sensitiveWord = context.sensitiveWord();
    // 自定義不同的敏感詞替換策略，可以從數(shù)據(jù)庫等地方讀取
    if ("紅旗".equals(sensitiveWord)) {
      return "旗幟";
    }

    if ("天安門".equals(sensitiveWord)) {
      return "門";
    }

    if ("主席".equals(sensitiveWord)) {
      return "教員";
    }
    // 其他默認使用 * 代替
    int wordLength = context.wordLength();
    return CharUtil.repeat('*', wordLength);
  }

}

使用：

@Test
void testWord() {
  String text = "紅旗迎風飄揚，主席的畫像屹立在天安門前。";
  System.out.println(SensitiveWordHelper.contains(text));
  System.out.println(SensitiveWordHelper.replace(text, new MySensitiveWordReplace()));

  String text1 = "最好的記憶不如最淡的墨水。";
  System.out.println(SensitiveWordHelper.contains(text1));
  System.out.println(SensitiveWordHelper.replace(text1, new MySensitiveWordReplace()));

}

輸出：

Init sensitive word map end! Cost time: 16ms
true
旗幟迎風飄揚，教員的畫像屹立在門前。
false
最好的記憶不如最淡的墨水。

自定義

點進 SensitiveWordHelper 源碼，可以看見以下代碼：

private static final SensitiveWordBs WORD_BS = SensitiveWordBs.newInstance().init();

而且可以發(fā)現(xiàn)，方法也都是調(diào)用的 SensitiveWordBs 類的方法。所以，可以理解成 SensitiveWordHelper 只是對 SensitiveWordBs 的一層封裝，而之所以封裝就是為了提供給開發(fā)者針對簡單場景的快速的使用。

而且從上面的創(chuàng)建語句中可以看見，沒有加任何其他的東西，就只是初始化了一個，這也是最簡單的。接下來就是自定義 SensitiveWordBs 實現(xiàn)敏感詞過濾。

自定義SensitiveWordBs

下來看有哪些參數(shù)可以加，各項配置的說明如下：

序號	方法	說明
1	ignoreCase	忽略大小寫
2	ignoreWidth	忽略半角圓角
3	ignoreNumStyle	忽略數(shù)字的寫法
4	ignoreChineseStyle	忽略中文的書寫格式
5	ignoreEnglishStyle	忽略英文的書寫格式
6	ignoreRepeat	忽略重復詞
7	enableNumCheck	是否啟用數(shù)字檢測。默認連續(xù) 8 位數(shù)字認為是敏感詞
8	enableEmailCheck	是有啟用郵箱檢測
9	enableUrlCheck	是否啟用鏈接檢測

然后創(chuàng)建自定義的 SensitiveWordBs，如下：

package com.tothefor.motorcode.core.SensitiveWord;

import com.github.houbb.sensitive.word.bs.SensitiveWordBs;
import com.github.houbb.sensitive.word.support.allow.WordAllows;
import com.github.houbb.sensitive.word.support.deny.WordDenys;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;

@Configuration
public class SensitiveWordConfig {

  @Autowired
  private CustomWordAllow customWordAllow;

  @Autowired
  private CustomWordDeny customWordDeny;

  /**
     * 初始化引導類
     *
     * @return 初始化引導類
     * @since 1.0.0
     */
  @Bean
  public SensitiveWordBs sensitiveWordBs() {
    // 可根據(jù)數(shù)據(jù)庫數(shù)據(jù)判斷 動態(tài)增加配置
    return SensitiveWordBs.newInstance()
      .wordDeny(WordDenys.chains(WordDenys.system(),customWordDeny)) // 設(shè)置黑名單
      .wordAllow(WordAllows.chains(WordAllows.system(), customWordAllow)) // 設(shè)置白名單
      .ignoreCase(true)
      .ignoreWidth(true)
      .ignoreNumStyle(true)
      .ignoreChineseStyle(true)
      .ignoreEnglishStyle(true)
      .ignoreRepeat(true)
      .enableEmailCheck(true)
      .enableUrlCheck(true)
      // 各種其他配置
      .init();
  }

}

其中，wordDeny、wordAllow是自定義敏感詞的黑名單和白名單。可以設(shè)置單個，也可以設(shè)置多個。如下：

// 設(shè)置系統(tǒng)默認敏感詞
SensitiveWordBs wordBs = SensitiveWordBs.newInstance()
     .wordDeny(WordDenys.system()) // 黑名單
     .wordAllow(WordAllows.system()) // 白名單
     .init();

// 設(shè)置自定義敏感詞
SensitiveWordBs wordBs = SensitiveWordBs.newInstance()
     .wordDeny(new MyWordDeny())
     .wordAllow(new MyWordAllow())
     .init();

// 設(shè)置多個敏感詞，系統(tǒng)默認和自定義
IWordDeny wordDeny = WordDenys.chains(WordDenys.system(), new MyWordDeny());
IWordAllow wordAllow = WordAllows.chains(WordAllows.system(), new MyWordAllow());
SensitiveWordBs wordBs = SensitiveWordBs.newInstance()
     .wordDeny(wordDeny)
     .wordAllow(wordAllow)
     .init();

接下來再加自定義敏感詞配置。

自定義敏感詞白名單

自定義有哪一些是敏感詞白名單，如果遇見是需要進行展示的。通過實現(xiàn) IWordAllow 接口重寫 allow() 方法返回白名單敏感詞。

package com.tothefor.motorcode.core.SensitiveWord;

import com.github.houbb.sensitive.word.api.IWordAllow;
import org.springframework.stereotype.Service;

import java.util.Arrays;
import java.util.List;

@Service
public class CustomWordAllow implements IWordAllow {

  /**
     * 允許的內(nèi)容-返回的內(nèi)容不被當做敏感詞
     * @return
     */
  @Override
  public List<String> allow() {
    // 從數(shù)據(jù)庫中查詢白名單敏感詞
    return Arrays.asList("紅旗");
  }

}

自定義敏感詞黑名單

增加敏感詞黑名單。通過實現(xiàn) IWordDeny 接口重寫 deny() 方法返回黑名單敏感詞。

package com.tothefor.motorcode.core.SensitiveWord;

import com.github.houbb.sensitive.word.api.IWordDeny;
import org.springframework.stereotype.Service;

import java.util.Arrays;
import java.util.List;

/**
 * 自定義敏感詞
 */
@Service
public class CustomWordDeny implements IWordDeny {

  /**
     * 拒絕出現(xiàn)的數(shù)據(jù)-返回的內(nèi)容被當做是敏感詞
     *
     * @return
     */
  @Override
  public List<String> deny() {
    // 從數(shù)據(jù)庫中查詢自定義敏感詞
    return Arrays.asList("紅旗");
  }

}

示例

測試自定義使用：

@Autowired
private SensitiveWordBs sensitiveWordBs;

@Test
void testWord() {
  String text = "紅旗迎風飄揚，主席的畫像屹立在天安門前。";
  System.out.println(sensitiveWordBs.contains(text));

  System.out.println(sensitiveWordBs.replace(text));
  System.out.println(sensitiveWordBs.replace(text, '0'));
  System.out.println(sensitiveWordBs.replace(text, new MySensitiveWordReplace()));

  System.out.println(sensitiveWordBs.findFirst(text));
  System.out.println(sensitiveWordBs.findFirst(text, WordResultHandlers.word()));
  System.out.println(sensitiveWordBs.findFirst(text, WordResultHandlers.raw()));

  System.out.println(sensitiveWordBs.findAll(text));
  System.out.println(sensitiveWordBs.findAll(text, WordResultHandlers.word()));
  System.out.println(sensitiveWordBs.findAll(text, WordResultHandlers.raw()));

}

輸出：

true
紅旗迎風飄揚，***的畫像屹立在***前。
紅旗迎風飄揚，000的畫像屹立在000前。
紅旗迎風飄揚，教員的畫像屹立在門前。
主席
主席
WordResult{word='主席', startIndex=9, endIndex=12}
[主席, 天安門]
[主席, 天安門]
[WordResult{word='主席', startIndex=9, endIndex=12}, WordResult{word='天安門', startIndex=18, endIndex=21}]

可以看見，和之前的有一點不一樣。‘紅旗’ 并沒有被過濾掉，主要原因就是因為我們的自定義敏感詞白名單中加入了 ‘紅旗’ ，所以沒有被過濾掉。但是黑名單中又有這個詞，為什么沒有被過濾掉？這里有個點就是：如果黑名單和白名單中都有同一個敏感詞，那么這個詞是不會被過濾的。

重置詞庫

因為敏感詞庫的初始化較為耗時，建議程序啟動時做一次 init 初始化。但為了保證敏感詞修改可以實時生效且保證接口的盡可能簡化，可以在數(shù)據(jù)庫詞庫發(fā)生變更時，需要詞庫生效，主動觸發(fā)一次初始化 sensitiveWordBs.init()。因為在調(diào)用 sensitiveWordBs.init() 的時候，根據(jù) IWordDeny+IWordAllow 重新構(gòu)建敏感詞庫。因為初始化可能耗時較長（秒級別），所有優(yōu)化為 init 未完成時不影響舊的詞庫功能，完成后以新的為準。

@Autowired
private SensitiveWordBs sensitiveWordBs;
sensitiveWordBs.init();

每次數(shù)據(jù)庫的信息發(fā)生變化之后，首先調(diào)用更新數(shù)據(jù)庫敏感詞庫的方法，然后調(diào)用這個方法。但不推薦將此方法放在數(shù)據(jù)庫被修改后就調(diào)用，而推薦單獨開一個接口，手動調(diào)用。

總結(jié)

所有的操作均是在 SensitiveWordBs 上操作的。

以上就是SpringBoot使用SensitiveWord實現(xiàn)敏感詞過濾的詳細內(nèi)容，更多關(guān)于SpringBoot SensitiveWord敏感詞過濾的資料請關(guān)注腳本之家其它相關(guān)文章！

您可能感興趣的文章:

欧美bbbwbbbw肥妇,免费乱码人妻系列日韩,一级黄片

SpringBoot使用SensitiveWord實現(xiàn)敏感詞過濾

目錄

導入依賴

方法

默認示例

自定義替換策略示例

自定義

自定義SensitiveWordBs

自定義敏感詞白名單

自定義敏感詞黑名單

示例

重置詞庫

總結(jié)

相關(guān)文章

最新評論

大家感興趣的內(nèi)容

最近更新的內(nèi)容

常用在線小工具