Springboot通過lucene實(shí)現(xiàn)全文檢索詳解流程
Lucene提供了一個(gè)簡(jiǎn)單卻強(qiáng)大的應(yīng)用程序接口(API),能夠做全文索引和搜尋,在Java開發(fā)環(huán)境里L(fēng)ucene是一個(gè)成熟的免費(fèi)開放源代碼工具。
Lucene全文檢索就是對(duì)文檔中全部?jī)?nèi)容進(jìn)行分詞,然后對(duì)所有單詞建立倒排索引的過程。主要操作是使用Lucene的API來實(shí)現(xiàn)對(duì)索引的增(創(chuàng)建索引)、刪(刪除索引)、改(修改索引)、查(搜索數(shù)據(jù))。
假設(shè)我們的電腦的目錄中含有很多文本文檔,我們需要查找哪些文檔含有某個(gè)關(guān)鍵詞。為了實(shí)現(xiàn)這種功能,我們首先利用 Lucene 對(duì)這個(gè)目錄中的文檔建立索引,然后在建立好的索引中搜索我們所要查找的文檔。通過這個(gè)例子讀者會(huì)對(duì)如何利用 Lucene 構(gòu)建自己的搜索應(yīng)用程序有個(gè)比較清楚的認(rèn)識(shí)。
建立索引
Document
Document 是用來描述文檔的,這里的文檔可以指一個(gè) HTML 頁(yè)面,一封電子郵件,或者是一個(gè)文本文件。一個(gè) Document 對(duì)象由多個(gè) Field 對(duì)象組成,可以把一個(gè) Document 對(duì)象想象成數(shù)據(jù)庫(kù)中的一個(gè)記錄,而每個(gè) Field 對(duì)象就是記錄的一個(gè)字段。
Field
Field 對(duì)象是用來描述一個(gè)文檔的某個(gè)屬性的,比如一封電子郵件的標(biāo)題和內(nèi)容可以用兩個(gè) Field 對(duì)象分別描述。
Analyzer
在一個(gè)文檔被索引之前,首先需要對(duì)文檔內(nèi)容進(jìn)行分詞處理,這部分工作就是由 Analyzer 來做的。Analyzer 類是一個(gè)抽象類,它有多個(gè)實(shí)現(xiàn)。針對(duì)不同的語(yǔ)言和應(yīng)用需要選擇適合的 Analyzer。Analyzer 把分詞后的內(nèi)容交給 IndexWriter 來建立索引。
IndexWriter
IndexWriter 是 Lucene 用來創(chuàng)建索引的一個(gè)核心的類,他的作用是把一個(gè)個(gè)的 Document 對(duì)象加到索引中來。
Directory
這個(gè)類代表了 Lucene 的索引的存儲(chǔ)的位置,這是一個(gè)抽象類,它目前有兩個(gè)實(shí)現(xiàn),第一個(gè)是 FSDirectory,它表示一個(gè)存儲(chǔ)在文件系統(tǒng)中的索引的位置。第二個(gè)是 RAMDirectory,它表示一個(gè)存儲(chǔ)在內(nèi)存當(dāng)中的索引的位置。
檢索文檔
Query
這是一個(gè)抽象類,他有多個(gè)實(shí)現(xiàn),比如 TermQuery, BooleanQuery, PrefixQuery. 這個(gè)類的目的是把用戶輸入的查詢字符串封裝成 Lucene 能夠識(shí)別的 Query。
Term
Term 是搜索的基本單位,一個(gè) Term 對(duì)象有兩個(gè) String 類型的域組成。生成一個(gè) Term 對(duì)象可以有如下一條語(yǔ)句來完成:Term term = new Term(“fieldName”,”queryWord”); 其中第一個(gè)參數(shù)代表了要在文檔的哪一個(gè) Field 上進(jìn)行查找,第二個(gè)參數(shù)代表了要查詢的關(guān)鍵詞。
TermQuery
TermQuery 是抽象類 Query 的一個(gè)子類,它同時(shí)也是 Lucene 支持的最為基本的一個(gè)查詢類。生成一個(gè) TermQuery 對(duì)象由如下語(yǔ)句完成: TermQuery termQuery = new TermQuery(new Term(“fieldName”,”queryWord”)); 它的構(gòu)造函數(shù)只接受一個(gè)參數(shù),那就是一個(gè) Term 對(duì)象。
IndexSearcher
IndexSearcher 是用來在建立好的索引上進(jìn)行搜索的。它只能以只讀的方式打開一個(gè)索引,所以可以有多個(gè) IndexSearcher 的實(shí)例在一個(gè)索引上進(jìn)行操作。
Hits
Hits 是用來保存搜索的結(jié)果的。
實(shí)例
1、pom依賴
<!-- lucene核心庫(kù) -->
<dependency>
<groupId>org.apache.lucene</groupId>
<artifactId>lucene-core</artifactId>
<version>7.6.0</version>
</dependency>
<!-- Lucene的查詢解析器 -->
<dependency>
<groupId>org.apache.lucene</groupId>
<artifactId>lucene-queryparser</artifactId>
<version>7.6.0</version>
</dependency>
<!-- lucene的默認(rèn)分詞器庫(kù),適用于英文分詞 -->
<dependency>
<groupId>org.apache.lucene</groupId>
<artifactId>lucene-analyzers-common</artifactId>
<version>7.6.0</version>
</dependency>
<!-- lucene的高亮顯示 -->
<dependency>
<groupId>org.apache.lucene</groupId>
<artifactId>lucene-highlighter</artifactId>
<version>7.6.0</version>
</dependency>
<!-- smartcn中文分詞器 -->
<dependency>
<groupId>org.apache.lucene</groupId>
<artifactId>lucene-analyzers-smartcn</artifactId>
<version>7.6.0</version>
</dependency>
<!-- ik分詞器 -->
<dependency>
<groupId>com.janeluo</groupId>
<artifactId>ikanalyzer</artifactId>
<version>2012_u6</version>
</dependency>
2、自定義IK分詞器
public class MyIKAnalyzer extends Analyzer {
private boolean useSmart;
public MyIKAnalyzer() {
this(false);
}
public MyIKAnalyzer(boolean useSmart) {
this.useSmart = useSmart;
}
@Override
protected TokenStreamComponents createComponents(String s) {
Tokenizer _MyIKTokenizer = new MyIKTokenizer(this.useSmart());
return new TokenStreamComponents(_MyIKTokenizer);
}
public boolean useSmart() {
return this.useSmart;
}
public void setUseSmart(boolean useSmart) {
this.useSmart = useSmart;
}
}
public class MyIKTokenizer extends Tokenizer {
private IKSegmenter _IKImplement;
private final CharTermAttribute termAtt = (CharTermAttribute)this.addAttribute(CharTermAttribute.class);
private final OffsetAttribute offsetAtt = (OffsetAttribute)this.addAttribute(OffsetAttribute.class);
private final TypeAttribute typeAtt = (TypeAttribute)this.addAttribute(TypeAttribute.class);
private int endPosition;
//useSmart:設(shè)置是否使用智能分詞。默認(rèn)為false,使用細(xì)粒度分詞,這里如果更改為TRUE,那么搜索到的結(jié)果可能就少的很多
public MyIKTokenizer(boolean useSmart) {
this._IKImplement = new IKSegmenter(this.input, useSmart);
}
@Override
public boolean incrementToken() throws IOException {
this.clearAttributes();
Lexeme nextLexeme = this._IKImplement.next();
if (nextLexeme != null) {
this.termAtt.append(nextLexeme.getLexemeText());
this.termAtt.setLength(nextLexeme.getLength());
this.offsetAtt.setOffset(nextLexeme.getBeginPosition(), nextLexeme.getEndPosition());
this.endPosition = nextLexeme.getEndPosition();
this.typeAtt.setType(nextLexeme.getLexemeTypeString());
return true;
} else {
return false;
}
}
@Override
public void reset() throws IOException {
super.reset();
this._IKImplement.reset(this.input);
}
@Override
public final void end() {
int finalOffset = this.correctOffset(this.endPosition);
this.offsetAtt.setOffset(finalOffset, finalOffset);
}
}測(cè)試1
@RequestMapping("/createIndex")
public String createIndex() throws IOException {
List<Content> list1 = new ArrayList<>();
list1.add(new Content(null, "Java面向?qū)ο?, "10", null, "Java面向?qū)ο髲娜腴T到精通,簡(jiǎn)單上手"));
list1.add(new Content(null, "Java面向?qū)ο骿ava", "10", null, "Java面向?qū)ο髲娜腴T到精通,簡(jiǎn)單上手"));
list1.add(new Content(null, "Java面向編程", "15", null, "Java面向?qū)ο缶幊虝?));
list1.add(new Content(null, "JavaScript入門", "18", null, "JavaScript入門編程書籍"));
list1.add(new Content(null, "深入理解Java編程", "13", null, "十三四天掌握J(rèn)ava基礎(chǔ)"));
list1.add(new Content(null, "從入門到放棄_Java", "20", null, "一門從入門到放棄的書籍"));
list1.add(new Content(null, "Head First Java", "30", null, "《Head First Java》是一本完整地面向?qū)ο?object-oriented,OO)程序設(shè)計(jì)和Java的學(xué)習(xí)指導(dǎo)用書"));
list1.add(new Content(null, "Java 核心技術(shù):卷1 基礎(chǔ)知識(shí)", "22", null, "全書共14章,包括Java基本的程序結(jié)構(gòu)、對(duì)象與類、繼承、接口與內(nèi)部類、圖形程序設(shè)計(jì)、事件處理、Swing用戶界面組件"));
list1.add(new Content(null, "Java 編程思想", "12", null, "本書贏得了全球程序員的廣泛贊譽(yù),即使是最晦澀的概念,在Bruce Eckel的文字親和力和小而直接的編程示例面前也會(huì)化解于無形"));
list1.add(new Content(null, "Java開發(fā)實(shí)戰(zhàn)經(jīng)典", "51", null, "本書是一本綜合講解Java核心技術(shù)的書籍,在書中使用大量的代碼及案例進(jìn)行知識(shí)點(diǎn)的分析與運(yùn)用"));
list1.add(new Content(null, "Effective Java", "10", null, "本書介紹了在Java編程中57條極具實(shí)用價(jià)值的經(jīng)驗(yàn)規(guī)則,這些經(jīng)驗(yàn)規(guī)則涵蓋了大多數(shù)開發(fā)人員每天所面臨的問題的解決方案"));
list1.add(new Content(null, "分布式 Java 應(yīng)用:基礎(chǔ)與實(shí)踐", "14", null, "本書介紹了編寫分布式Java應(yīng)用涉及的眾多知識(shí)點(diǎn),分為了基于Java實(shí)現(xiàn)網(wǎng)絡(luò)通信、RPC;基于SOA實(shí)現(xiàn)大型分布式Java應(yīng)用"));
list1.add(new Content(null, "http權(quán)威指南", "11", null, "超文本傳輸協(xié)議(Hypertext Transfer Protocol,HTTP)是在萬維網(wǎng)上進(jìn)行通信時(shí)所使用的協(xié)議方案"));
list1.add(new Content(null, "Spring", "15", null, "這是啥,還需要學(xué)習(xí)嗎?Java程序員必備書籍"));
list1.add(new Content(null, "深入理解 Java 虛擬機(jī)", "18", null, "作為一位Java程序員,你是否也曾經(jīng)想深入理解Java虛擬機(jī),但是卻被它的復(fù)雜和深?yuàn)W拒之門外"));
list1.add(new Content(null, "springboot實(shí)戰(zhàn)", "11", null, "完成對(duì)于springboot的理解,是每個(gè)Java程序員必備的姿勢(shì)"));
list1.add(new Content(null, "springmvc學(xué)習(xí)", "72", null, "springmvc學(xué)習(xí)指南"));
list1.add(new Content(null, "vue入門到放棄", "20", null, "vue入門到放棄書籍信息"));
list1.add(new Content(null, "vue入門到精通", "20", null, "vue入門到精通相關(guān)書籍信息"));
list1.add(new Content(null, "vue之旅", "20", null, "由淺入深地全面介紹vue技術(shù),包含大量案例與代碼"));
list1.add(new Content(null, "vue實(shí)戰(zhàn)", "20", null, "以實(shí)戰(zhàn)為導(dǎo)向,系統(tǒng)講解如何使用 "));
list1.add(new Content(null, "vue入門與實(shí)踐", "20", null, "現(xiàn)已得到蘋果、微軟、谷歌等主流廠商全面支持"));
list1.add(new Content(null, "Vue.js應(yīng)用測(cè)試", "20", null, "Vue.js創(chuàng)始人尤雨溪鼎力推薦!Vue官方測(cè)試工具作者親筆撰寫,Vue.js應(yīng)用測(cè)試完全學(xué)習(xí)指南"));
list1.add(new Content(null, "PHP和MySQL Web開發(fā)", "20", null, "本書是利用PHP和MySQL構(gòu)建數(shù)據(jù)庫(kù)驅(qū)動(dòng)的Web應(yīng)用程序的權(quán)威指南"));
list1.add(new Content(null, "Web高效編程與優(yōu)化實(shí)踐", "20", null, "從思想提升和內(nèi)容修煉兩個(gè)維度,圍繞前端工程師必備的前端技術(shù)和編程基礎(chǔ)"));
list1.add(new Content(null, "Vue.js 2.x實(shí)踐指南", "20", null, "本書旨在讓初學(xué)者能夠快速上手vue技術(shù)棧,并能夠利用所學(xué)知識(shí)獨(dú)立動(dòng)手進(jìn)行項(xiàng)目開發(fā)"));
list1.add(new Content(null, "初始vue", "20", null, "解開vue的面紗"));
list1.add(new Content(null, "什么是vue", "20", null, "一步一步的了解vue相關(guān)信息"));
list1.add(new Content(null, "深入淺出vue", "20", null, "深入淺出vue,慢慢掌握"));
list1.add(new Content(null, "三天vue實(shí)戰(zhàn)", "20", null, "三天掌握vue開發(fā)"));
list1.add(new Content(null, "不知火舞", "20", null, "不知名的vue"));
list1.add(new Content(null, "娜可露露", "20", null, "一招秒人"));
list1.add(new Content(null, "宮本武藏", "20", null, "我就是一個(gè)超級(jí)兵"));
list1.add(new Content(null, "vue宮本vue", "20", null, "我就是一個(gè)超級(jí)兵"));
// 創(chuàng)建文檔的集合
Collection<Document> docs = new ArrayList<>();
for (int i = 0; i < list1.size(); i++) {
//contentMapper.insertSelective(list1.get(i));
// 創(chuàng)建文檔對(duì)象
Document document = new Document();
//StringField會(huì)創(chuàng)建索引,但是不會(huì)被分詞,TextField,即創(chuàng)建索引又會(huì)被分詞。
document.add(new StringField("id", (i + 1) + "", Field.Store.YES));
document.add(new TextField("title", list1.get(i).getTitle(), Field.Store.YES));
document.add(new TextField("price", list1.get(i).getPrice(), Field.Store.YES));
document.add(new TextField("descs", list1.get(i).getDescs(), Field.Store.YES));
docs.add(document);
}
// 索引目錄類,指定索引在硬盤中的位置,我的設(shè)置為D盤的indexDir文件夾
Directory directory = FSDirectory.open(FileSystems.getDefault().getPath("D:\\Lucene\\indexDir"));
// 引入IK分詞器
Analyzer analyzer = new MyIKAnalyzer();
// 索引寫出工具的配置對(duì)象,這個(gè)地方就是最上面報(bào)錯(cuò)的問題解決方案
IndexWriterConfig conf = new IndexWriterConfig(analyzer);
// 設(shè)置打開方式:OpenMode.APPEND 會(huì)在索引庫(kù)的基礎(chǔ)上追加新索引。OpenMode.CREATE會(huì)先清空原來數(shù)據(jù),再提交新的索引
conf.setOpenMode(IndexWriterConfig.OpenMode.CREATE);
// 創(chuàng)建索引的寫出工具類。參數(shù):索引的目錄和配置信息
IndexWriter indexWriter = new IndexWriter(directory, conf);
// 把文檔集合交給IndexWriter
indexWriter.addDocuments(docs);
// 提交
indexWriter.commit();
// 關(guān)閉
indexWriter.close();
return "success";
}
@RequestMapping("/updateIndex")
public String update(String age) throws IOException {
// 創(chuàng)建目錄對(duì)象
Directory directory = FSDirectory.open(FileSystems.getDefault().getPath("D:\\Lucene\\indexDir"));
// 創(chuàng)建配置對(duì)象
IndexWriterConfig conf = new IndexWriterConfig(new MyIKAnalyzer());
// 創(chuàng)建索引寫出工具
IndexWriter writer = new IndexWriter(directory, conf);
// 創(chuàng)建新的文檔數(shù)據(jù)
Document doc = new Document();
doc.add(new StringField("id", "34", Field.Store.YES));
//Content content = contentMapper.selectByPrimaryKey("34");
//content.setTitle("宮本武藏超級(jí)兵");
//contentMapper.updateByPrimaryKeySelective(content);
Content content = new Content(34, "宮本武藏超級(jí)兵", "", "", "");
doc.add(new TextField("title", content.getTitle(), Field.Store.YES));
doc.add(new TextField("price", content.getPrice(), Field.Store.YES));
doc.add(new TextField("descs", content.getDescs(), Field.Store.YES));
writer.updateDocument(new Term("id", "34"), doc);
// 提交
writer.commit();
// 關(guān)閉
writer.close();
return "success";
}
@RequestMapping("/deleteIndex")
public String deleteIndex() throws IOException {
// 創(chuàng)建目錄對(duì)象
Directory directory = FSDirectory.open(FileSystems.getDefault().getPath("D:\\Lucene\\indexDir"));
// 創(chuàng)建配置對(duì)象
IndexWriterConfig conf = new IndexWriterConfig(new IKAnalyzer());
// 創(chuàng)建索引寫出工具
IndexWriter writer = new IndexWriter(directory, conf);
// 根據(jù)詞條進(jìn)行刪除
writer.deleteDocuments(new Term("id", "34"));
// 提交
writer.commit();
// 關(guān)閉
writer.close();
return "success";
}
@RequestMapping("/searchText")
public Object searchText(String text, HttpServletRequest request) throws IOException, ParseException {
Directory directory = FSDirectory.open(FileSystems.getDefault().getPath("D:\\Lucene\\indexDir"));
// 索引讀取工具
IndexReader reader = DirectoryReader.open(directory);
// 索引搜索工具
IndexSearcher searcher = new IndexSearcher(reader);
// 創(chuàng)建查詢解析器,兩個(gè)參數(shù):默認(rèn)要查詢的字段的名稱,分詞器
QueryParser parser = new QueryParser("descs", new MyIKAnalyzer());
// 創(chuàng)建查詢對(duì)象
Query query = parser.parse(text);
// 獲取前十條記錄
TopDocs topDocs = searcher.search(query, 10);
// 獲取總條數(shù)
System.out.println("本次搜索共找到" + topDocs.totalHits + "條數(shù)據(jù)");
// 獲取得分文檔對(duì)象(ScoreDoc)數(shù)組.SocreDoc中包含:文檔的編號(hào)、文檔的得分
ScoreDoc[] scoreDocs = topDocs.scoreDocs;
List<Content> list = new ArrayList<>();
for (ScoreDoc scoreDoc : scoreDocs) {
// 取出文檔編號(hào)
int docID = scoreDoc.doc;
// 根據(jù)編號(hào)去找文檔
Document doc = reader.document(docID);
//Content content = contentMapper.selectByPrimaryKey(doc.get("id"));
Content content = new Content();
content.setId(Integer.valueOf(doc.get("id")));
content.setTitle(doc.get("title"));
content.setDescs(doc.get("descs"));
list.add(content);
}
return list;
}
@RequestMapping("/searchText1")
public Object searchText1(String text, HttpServletRequest request) throws IOException, ParseException {
String[] str = {"title", "descs"};
Directory directory = FSDirectory.open(FileSystems.getDefault().getPath("d:\\indexDir"));
// 索引讀取工具
IndexReader reader = DirectoryReader.open(directory);
// 索引搜索工具
IndexSearcher searcher = new IndexSearcher(reader);
// 創(chuàng)建查詢解析器,兩個(gè)參數(shù):默認(rèn)要查詢的字段的名稱,分詞器
MultiFieldQueryParser parser = new MultiFieldQueryParser(str, new MyIKAnalyzer());
// 創(chuàng)建查詢對(duì)象
Query query = parser.parse(text);
// 獲取前十條記錄
TopDocs topDocs = searcher.search(query, 100);
// 獲取總條數(shù)
System.out.println("本次搜索共找到" + topDocs.totalHits + "條數(shù)據(jù)");
// 獲取得分文檔對(duì)象(ScoreDoc)數(shù)組.SocreDoc中包含:文檔的編號(hào)、文檔的得分
ScoreDoc[] scoreDocs = topDocs.scoreDocs;
List<Content> list = new ArrayList<>();
for (ScoreDoc scoreDoc : scoreDocs) {
// 取出文檔編號(hào)
int docID = scoreDoc.doc;
// 根據(jù)編號(hào)去找文檔
Document doc = reader.document(docID);
//Content content = contentMapper.selectByPrimaryKey(doc.get("id"));
Content content = new Content();
content.setId(Integer.valueOf(doc.get("id")));
list.add(content);
}
return list;
}
@RequestMapping("/searchText2")
public Object searchText2(String text, HttpServletRequest request) throws IOException, ParseException, InvalidTokenOffsetsException {
String[] str = {"title", "descs"};
Directory directory = FSDirectory.open(FileSystems.getDefault().getPath("d:\\indexDir"));
// 索引讀取工具
IndexReader reader = DirectoryReader.open(directory);
// 索引搜索工具
IndexSearcher searcher = new IndexSearcher(reader);
// 創(chuàng)建查詢解析器,兩個(gè)參數(shù):默認(rèn)要查詢的字段的名稱,分詞器
MultiFieldQueryParser parser = new MultiFieldQueryParser(str, new MyIKAnalyzer());
// 創(chuàng)建查詢對(duì)象
Query query = parser.parse(text);
// 獲取前十條記錄
TopDocs topDocs = searcher.search(query, 100);
// 獲取總條數(shù)
System.out.println("本次搜索共找到" + topDocs.totalHits + "條數(shù)據(jù)");
//高亮顯示
SimpleHTMLFormatter simpleHTMLFormatter = new SimpleHTMLFormatter("<span style='color:red'>", "</span>");
Highlighter highlighter = new Highlighter(simpleHTMLFormatter, new QueryScorer(query));
Fragmenter fragmenter = new SimpleFragmenter(100); //高亮后的段落范圍在100字內(nèi)
highlighter.setTextFragmenter(fragmenter);
// 獲取得分文檔對(duì)象(ScoreDoc)數(shù)組.SocreDoc中包含:文檔的編號(hào)、文檔的得分
ScoreDoc[] scoreDocs = topDocs.scoreDocs;
List<Content> list = new ArrayList<>();
for (ScoreDoc scoreDoc : scoreDocs) {
// 取出文檔編號(hào)
int docID = scoreDoc.doc;
// 根據(jù)編號(hào)去找文檔
Document doc = reader.document(docID);
//Content content = contentMapper.selectByPrimaryKey(doc.get("id"));
Content content = new Content();
//處理高亮字段顯示
String title = highlighter.getBestFragment(new MyIKAnalyzer(), "title", doc.get("title"));
if (title == null) {
title = content.getTitle();
}
String descs = highlighter.getBestFragment(new MyIKAnalyzer(), "descs", doc.get("descs"));
if (descs == null) {
descs = content.getDescs();
}
content.setDescs(descs);
content.setTitle(title);
list.add(content);
}
request.setAttribute("list", list);
return "index";
}
@RequestMapping("/searchText3")
public String searchText3(String text, HttpServletRequest request) throws IOException, ParseException, InvalidTokenOffsetsException {
String[] str = {"title", "descs"};
int page = 1;
int pageSize = 10;
Directory directory = FSDirectory.open(FileSystems.getDefault().getPath("d:\\indexDir"));
// 索引讀取工具
IndexReader reader = DirectoryReader.open(directory);
// 索引搜索工具
IndexSearcher searcher = new IndexSearcher(reader);
// 創(chuàng)建查詢解析器,兩個(gè)參數(shù):默認(rèn)要查詢的字段的名稱,分詞器
MultiFieldQueryParser parser = new MultiFieldQueryParser(str, new MyIKAnalyzer());
// 創(chuàng)建查詢對(duì)象
Query query = parser.parse(text);
// 獲取前十條記錄
//TopDocs topDocs = searcher.search(query, 100);
TopDocs topDocs = searchByPage(page, pageSize, searcher, query);
// 獲取總條數(shù)
System.out.println("本次搜索共找到" + topDocs.totalHits + "條數(shù)據(jù)");
//高亮顯示
SimpleHTMLFormatter simpleHTMLFormatter = new SimpleHTMLFormatter("<span style='color:red'>", "</span>");
Highlighter highlighter = new Highlighter(simpleHTMLFormatter, new QueryScorer(query));
Fragmenter fragmenter = new SimpleFragmenter(100); //高亮后的段落范圍在100字內(nèi)
highlighter.setTextFragmenter(fragmenter);
// 獲取得分文檔對(duì)象(ScoreDoc)數(shù)組.SocreDoc中包含:文檔的編號(hào)、文檔的得分
ScoreDoc[] scoreDocs = topDocs.scoreDocs;
List<Content> list = new ArrayList<>();
for (ScoreDoc scoreDoc : scoreDocs) {
// 取出文檔編號(hào)
int docID = scoreDoc.doc;
// 根據(jù)編號(hào)去找文檔
Document doc = reader.document(docID);
//Content content = contentMapper.selectByPrimaryKey(doc.get("id"));
Content content = new Content();
//處理高亮字段顯示
String title = highlighter.getBestFragment(new MyIKAnalyzer(), "title", doc.get("title"));
if (title == null) {
title = content.getTitle();
}
String descs = highlighter.getBestFragment(new MyIKAnalyzer(), "descs", doc.get("descs"));
if (descs == null) {
descs = content.getDescs();
}
content.setDescs(descs);
content.setTitle(title);
list.add(content);
}
System.err.println("list的長(zhǎng)度:" + list.size());
request.setAttribute("page", page);
request.setAttribute("pageSize", pageSize);
request.setAttribute("list", list);
return "index";
}
private TopDocs searchByPage(int page, int perPage, IndexSearcher searcher, Query query) throws IOException {
TopDocs result = null;
if (query == null) {
System.out.println(" Query is null return null ");
return null;
}
ScoreDoc before = null;
if (page != 1) {
TopDocs docsBefore = searcher.search(query, (page - 1) * perPage);
ScoreDoc[] scoreDocs = docsBefore.scoreDocs;
if (scoreDocs.length > 0) {
before = scoreDocs[scoreDocs.length - 1];
}
}
result = searcher.searchAfter(before, query, perPage);
return result;
}
@RequestMapping("/searchText4")
public String searchText4(String text, HttpServletRequest request) throws IOException, ParseException, InvalidTokenOffsetsException {
String[] str = {"title", "descs"};
int page = 1;
int pageSize = 100;
IndexSearcher searcher = getMoreSearch("d:\\indexDir");
// 創(chuàng)建查詢解析器,兩個(gè)參數(shù):默認(rèn)要查詢的字段的名稱,分詞器
MultiFieldQueryParser parser = new MultiFieldQueryParser(str, new MyIKAnalyzer());
// 創(chuàng)建查詢對(duì)象
Query query = parser.parse(text);
TopDocs topDocs = searchByPage(page, pageSize, searcher, query);
// 獲取總條數(shù)
System.out.println("本次搜索共找到" + topDocs.totalHits + "條數(shù)據(jù)");
//高亮顯示
SimpleHTMLFormatter simpleHTMLFormatter = new SimpleHTMLFormatter("<span style='color:red'>", "</span>");
Highlighter highlighter = new Highlighter(simpleHTMLFormatter, new QueryScorer(query));
Fragmenter fragmenter = new SimpleFragmenter(100); //高亮后的段落范圍在100字內(nèi)
highlighter.setTextFragmenter(fragmenter);
// 獲取得分文檔對(duì)象(ScoreDoc)數(shù)組.SocreDoc中包含:文檔的編號(hào)、文檔的得分
ScoreDoc[] scoreDocs = topDocs.scoreDocs;
List<Content> list = new ArrayList<>();
for (ScoreDoc scoreDoc : scoreDocs) {
// 取出文檔編號(hào)
int docID = scoreDoc.doc;
// 根據(jù)編號(hào)去找文檔
//Document doc = reader.document(docID);
Document doc = searcher.doc(docID);//多索引找文檔要用searcher找了,reader容易報(bào)錯(cuò)
//Content content = contentMapper.selectByPrimaryKey(doc.get("id"));
Content content = new Content();
//處理高亮字段顯示
String title = highlighter.getBestFragment(new MyIKAnalyzer(), "title", doc.get("title"));
if (title == null) {
title = content.getTitle();
}
String descs = highlighter.getBestFragment(new MyIKAnalyzer(), "descs", doc.get("descs"));
if (descs == null) {
descs = content.getDescs();
}
content.setDescs(descs);
content.setTitle(title);
list.add(content);
}
System.err.println("list的長(zhǎng)度:" + list.size());
request.setAttribute("page", page);
request.setAttribute("pageSize", pageSize);
request.setAttribute("list", list);
return "index";
}
private IndexSearcher getMoreSearch(String string) {
MultiReader reader = null;
//設(shè)置
try {
File[] files = new File(string).listFiles();
IndexReader[] readers = new IndexReader[files.length];
for (int i = 0; i < files.length; i++) {
readers[i] = DirectoryReader.open(FSDirectory.open(Paths.get(files[i].getPath(), new String[0])));
}
reader = new MultiReader(readers);
} catch (IOException e) {
e.printStackTrace();
}
return new IndexSearcher(reader);
//如果索引文件過多,可以這樣加快效率
/**
ExecutorService service = Executors.newCachedThreadPool();
return new IndexSearcher(reader,service);
*/
}測(cè)試2
public static void main(String[] args) throws IOException, ParseException {
long startTime = System.currentTimeMillis();
// indexDir is the directory that hosts Lucene's index files
File indexDir = new File("D:\\Lucene\\indexDir");
// dataDir is the directory that hosts the text files that to be indexed
File dataDir = new File("D:\\Lucene\\dataDir");
Analyzer luceneAnalyzer = new StandardAnalyzer();
// 或引入IK分詞器
Analyzer IkAnalyzer = new MyIKAnalyzer();
Directory directory = FSDirectory.open(FileSystems.getDefault().getPath("D:\\Lucene\\indexDir"));
IndexWriterConfig indexWriterConfig = new IndexWriterConfig(IkAnalyzer);
// 設(shè)置打開方式:OpenMode.APPEND 會(huì)在索引庫(kù)的基礎(chǔ)上追加新索引、OpenMode.CREATE會(huì)先清空原來數(shù)據(jù),再提交新的索引
indexWriterConfig.setOpenMode(IndexWriterConfig.OpenMode.CREATE);
IndexWriter indexWriter = new IndexWriter(directory, indexWriterConfig);
File[] dataFiles = dataDir.listFiles();
for (int i = 0; i < dataFiles.length; i++) {
if (dataFiles[i].isFile() && dataFiles[i].getName().endsWith(".txt")) {
System.out.println("Indexing file " + dataFiles[i].getCanonicalPath());
Document document = new Document();
Reader txtReader = new FileReader(dataFiles[i]);
document.add(new TextField("path", dataFiles[i].getCanonicalPath(), Field.Store.YES));
document.add(new TextField("contents", txtReader));
indexWriter.addDocument(document);
}
}
indexWriter.commit();
indexWriter.close();
long endTime = System.currentTimeMillis();
System.out.println("It takes "
+ (endTime - startTime)
+ " milliseconds to create index for the files in directory "
+ dataDir.getPath());
String queryStr = "hello";
// 索引讀取工具
IndexReader indexReader = DirectoryReader.open(directory);
// 索引搜索工具
IndexSearcher indexSearcher = new IndexSearcher(indexReader);
// 創(chuàng)建查詢解析器,兩個(gè)參數(shù):默認(rèn)要查詢的字段的名稱,分詞器
QueryParser parser = new QueryParser("contents", IkAnalyzer);
// 創(chuàng)建查詢對(duì)象
Query query = parser.parse(queryStr);
// 獲取前十條記錄
TopDocs topDocs = indexSearcher.search(query, 10);
// 獲取總條數(shù)
System.out.println("本次搜索共找到" + topDocs.totalHits + "條數(shù)據(jù)");
// 獲取得分文檔對(duì)象(ScoreDoc)數(shù)組.SocreDoc中包含:文檔的編號(hào)、文檔的得分
ScoreDoc[] scoreDocs = topDocs.scoreDocs;
for (ScoreDoc scoreDoc : scoreDocs) {
// 取出文檔編號(hào)
int docID = scoreDoc.doc;
// 根據(jù)編號(hào)去找文檔
Document doc = indexReader.document(docID);
System.out.println(doc.get("path"));
}
}IndexWriter對(duì)象將dataDir下的所有txt文件建立索引,指定索引文件的目錄為indexDir,Document對(duì)象對(duì)應(yīng)一個(gè)帶搜索的文件,可以是文本文件也可以是一個(gè)網(wǎng)頁(yè),為Document對(duì)象指定field,這里為文本文件定義了兩個(gè)field:path和contents,運(yùn)行完第一部分代碼后,則在指定目錄下生成了索引文件,如下

IndexReader對(duì)象讀取索引文件,通過QueryParser對(duì)象指定語(yǔ)法分析器和對(duì)document的那個(gè)字段進(jìn)行查詢,Query對(duì)象則制定了搜索的關(guān)鍵字,通過IndexSearcher對(duì)象實(shí)現(xiàn)檢索,并返回結(jié)果集TopDocs,運(yùn)行完第二部分代碼后,會(huì)看到打印包含關(guān)鍵字的文本文件的路徑,如下

到此這篇關(guān)于Springboot通過lucene實(shí)現(xiàn)全文檢索詳解流程的文章就介紹到這了,更多相關(guān)Springboot lucene內(nèi)容請(qǐng)搜索腳本之家以前的文章或繼續(xù)瀏覽下面的相關(guān)文章希望大家以后多多支持腳本之家!
相關(guān)文章
時(shí)間處理函數(shù)工具分享(時(shí)間戳計(jì)算)
這篇文章主要介紹了時(shí)間處理函數(shù)工具,包括得到時(shí)間戳、周一、周末、時(shí)間更改、時(shí)間精確計(jì)算等功能2014-01-01
Spring Boot集成Redis實(shí)現(xiàn)緩存機(jī)制(從零開始學(xué)Spring Boot)
這篇文章主要介紹了Spring Boot集成Redis實(shí)現(xiàn)緩存機(jī)制(從零開始學(xué)Spring Boot),需要的朋友可以參考下2017-04-04
spring注解如何為bean指定InitMethod和DestroyMethod
這篇文章主要介紹了spring注解如何為bean指定InitMethod和DestroyMethod,具有很好的參考價(jià)值,希望對(duì)大家有所幫助。如有錯(cuò)誤或未考慮完全的地方,望不吝賜教2021-11-11
java使用正則表達(dá)校驗(yàn)手機(jī)號(hào)碼示例(手機(jī)號(hào)碼正則)
這篇文章主要介紹了java使用正則表達(dá)校驗(yàn)手機(jī)號(hào)碼示例,可校驗(yàn)三個(gè)號(hào)碼段:13*、15*、18*,大家根據(jù)自己的需要增加自己的號(hào)碼段就可以了2014-03-03
Spring中allowedOriginPatterns和allowedOrigins方法有何不同詳解
這篇文章主要給大家介紹了關(guān)于Spring中allowedOriginPatterns和allowedOrigins方法有何不同,allowedOriginPatterns和allowedOrigins都是用來設(shè)置允許跨域請(qǐng)求的來源,需要的朋友可以參考下2023-10-10
SpringBoot JVM參數(shù)調(diào)優(yōu)方式
這篇文章主要介紹了SpringBoot JVM參數(shù)調(diào)優(yōu)方式,具有很好的參考價(jià)值,希望對(duì)大家有所幫助。如有錯(cuò)誤或未考慮完全的地方,望不吝賜教2021-09-09
Springboot Redis設(shè)置key前綴的方法步驟
這篇文章主要介紹了Springboot Redis設(shè)置key前綴的方法步驟,文中通過示例代碼介紹的非常詳細(xì),對(duì)大家的學(xué)習(xí)或者工作具有一定的參考學(xué)習(xí)價(jià)值,需要的朋友們下面隨著小編來一起學(xué)習(xí)學(xué)習(xí)吧2021-04-04

