快捷導(dǎo)航

Springboot通過lucene實(shí)現(xiàn)全文檢索詳解流程

更新時(shí)間：2022年06月10日 11:28:56 作者：ldcaws

Lucene是一個(gè)基于Java的全文信息檢索工具包，它不是一個(gè)完整的搜索應(yīng)用程序，而是為你的應(yīng)用程序提供索引和搜索功能。Lucene 目前是 Apache Jakarta 家族中的一個(gè)開源項(xiàng)目，也是目前最為流行的基于 Java 開源全文檢索工具包

Lucene全文檢索就是對(duì)文檔中全部?jī)?nèi)容進(jìn)行分詞，然后對(duì)所有單詞建立倒排索引的過程。主要操作是使用Lucene的API來實(shí)現(xiàn)對(duì)索引的增（創(chuàng)建索引）、刪（刪除索引）、改（修改索引）、查（搜索數(shù)據(jù)）。

假設(shè)我們的電腦的目錄中含有很多文本文檔，我們需要查找哪些文檔含有某個(gè)關(guān)鍵詞。為了實(shí)現(xiàn)這種功能，我們首先利用 Lucene 對(duì)這個(gè)目錄中的文檔建立索引，然后在建立好的索引中搜索我們所要查找的文檔。通過這個(gè)例子讀者會(huì)對(duì)如何利用 Lucene 構(gòu)建自己的搜索應(yīng)用程序有個(gè)比較清楚的認(rèn)識(shí)。

建立索引

Document

Document 是用來描述文檔的，這里的文檔可以指一個(gè) HTML 頁(yè)面，一封電子郵件，或者是一個(gè)文本文件。一個(gè) Document 對(duì)象由多個(gè) Field 對(duì)象組成，可以把一個(gè) Document 對(duì)象想象成數(shù)據(jù)庫(kù)中的一個(gè)記錄，而每個(gè) Field 對(duì)象就是記錄的一個(gè)字段。

Field

Field 對(duì)象是用來描述一個(gè)文檔的某個(gè)屬性的，比如一封電子郵件的標(biāo)題和內(nèi)容可以用兩個(gè) Field 對(duì)象分別描述。

Analyzer

在一個(gè)文檔被索引之前，首先需要對(duì)文檔內(nèi)容進(jìn)行分詞處理，這部分工作就是由 Analyzer 來做的。Analyzer 類是一個(gè)抽象類，它有多個(gè)實(shí)現(xiàn)。針對(duì)不同的語(yǔ)言和應(yīng)用需要選擇適合的 Analyzer。Analyzer 把分詞后的內(nèi)容交給 IndexWriter 來建立索引。

IndexWriter

IndexWriter 是 Lucene 用來創(chuàng)建索引的一個(gè)核心的類，他的作用是把一個(gè)個(gè)的 Document 對(duì)象加到索引中來。

檢索文檔

Query

這是一個(gè)抽象類，他有多個(gè)實(shí)現(xiàn)，比如 TermQuery, BooleanQuery, PrefixQuery. 這個(gè)類的目的是把用戶輸入的查詢字符串封裝成 Lucene 能夠識(shí)別的 Query。

Term

Term 是搜索的基本單位，一個(gè) Term 對(duì)象有兩個(gè) String 類型的域組成。生成一個(gè) Term 對(duì)象可以有如下一條語(yǔ)句來完成：Term term = new Term(“fieldName”,”queryWord”); 其中第一個(gè)參數(shù)代表了要在文檔的哪一個(gè) Field 上進(jìn)行查找，第二個(gè)參數(shù)代表了要查詢的關(guān)鍵詞。

TermQuery

TermQuery 是抽象類 Query 的一個(gè)子類，它同時(shí)也是 Lucene 支持的最為基本的一個(gè)查詢類。生成一個(gè) TermQuery 對(duì)象由如下語(yǔ)句完成： TermQuery termQuery = new TermQuery(new Term(“fieldName”,”queryWord”)); 它的構(gòu)造函數(shù)只接受一個(gè)參數(shù)，那就是一個(gè) Term 對(duì)象。

IndexSearcher

IndexSearcher 是用來在建立好的索引上進(jìn)行搜索的。它只能以只讀的方式打開一個(gè)索引，所以可以有多個(gè) IndexSearcher 的實(shí)例在一個(gè)索引上進(jìn)行操作。

Hits

Hits 是用來保存搜索的結(jié)果的。

實(shí)例

1、pom依賴

		<!-- lucene核心庫(kù) -->
        <dependency>
            <groupId>org.apache.lucene</groupId>
            <artifactId>lucene-core</artifactId>
            <version>7.6.0</version>
        </dependency>
        <!-- Lucene的查詢解析器 -->
        <dependency>
            <groupId>org.apache.lucene</groupId>
            <artifactId>lucene-queryparser</artifactId>
            <version>7.6.0</version>
        </dependency>
        <!-- lucene的默認(rèn)分詞器庫(kù),適用于英文分詞 -->
        <dependency>
            <groupId>org.apache.lucene</groupId>
            <artifactId>lucene-analyzers-common</artifactId>
            <version>7.6.0</version>
        </dependency>
        <!-- lucene的高亮顯示 -->
        <dependency>
            <groupId>org.apache.lucene</groupId>
            <artifactId>lucene-highlighter</artifactId>
            <version>7.6.0</version>
        </dependency>
        <!-- smartcn中文分詞器 -->
        <dependency>
            <groupId>org.apache.lucene</groupId>
            <artifactId>lucene-analyzers-smartcn</artifactId>
            <version>7.6.0</version>
        </dependency>
        <!-- ik分詞器 -->
        <dependency>
            <groupId>com.janeluo</groupId>
            <artifactId>ikanalyzer</artifactId>
            <version>2012_u6</version>
        </dependency>

2、自定義IK分詞器

public class MyIKAnalyzer extends Analyzer {
    private boolean useSmart;
    public MyIKAnalyzer() {
        this(false);
    }
    public MyIKAnalyzer(boolean useSmart) {
        this.useSmart = useSmart;
    }
    @Override
    protected TokenStreamComponents createComponents(String s) {
        Tokenizer _MyIKTokenizer = new MyIKTokenizer(this.useSmart());
        return new TokenStreamComponents(_MyIKTokenizer);
    }
    public boolean useSmart() {
        return this.useSmart;
    }
    public void setUseSmart(boolean useSmart) {
        this.useSmart = useSmart;
    }
}

public class MyIKTokenizer extends Tokenizer {
    private IKSegmenter _IKImplement;
    private final CharTermAttribute termAtt = (CharTermAttribute)this.addAttribute(CharTermAttribute.class);
    private final OffsetAttribute offsetAtt = (OffsetAttribute)this.addAttribute(OffsetAttribute.class);
    private final TypeAttribute typeAtt = (TypeAttribute)this.addAttribute(TypeAttribute.class);
    private int endPosition;
    //useSmart：設(shè)置是否使用智能分詞。默認(rèn)為false，使用細(xì)粒度分詞，這里如果更改為TRUE，那么搜索到的結(jié)果可能就少的很多
    public MyIKTokenizer(boolean useSmart) {
        this._IKImplement = new IKSegmenter(this.input, useSmart);
    }
    @Override
    public boolean incrementToken() throws IOException {
        this.clearAttributes();
        Lexeme nextLexeme = this._IKImplement.next();
        if (nextLexeme != null) {
            this.termAtt.append(nextLexeme.getLexemeText());
            this.termAtt.setLength(nextLexeme.getLength());
            this.offsetAtt.setOffset(nextLexeme.getBeginPosition(), nextLexeme.getEndPosition());
            this.endPosition = nextLexeme.getEndPosition();
            this.typeAtt.setType(nextLexeme.getLexemeTypeString());
            return true;
        } else {
            return false;
        }
    }
    @Override
    public void reset() throws IOException {
        super.reset();
        this._IKImplement.reset(this.input);
    }
    @Override
    public final void end() {
        int finalOffset = this.correctOffset(this.endPosition);
        this.offsetAtt.setOffset(finalOffset, finalOffset);
    }
}

測(cè)試1

	@RequestMapping("/createIndex")
    public String createIndex() throws IOException {
        List<Content> list1 = new ArrayList<>();
        list1.add(new Content(null, "Java面向?qū)ο?, "10", null, "Java面向?qū)ο髲娜腴T到精通,簡(jiǎn)單上手"));
        list1.add(new Content(null, "Java面向?qū)ο骿ava", "10", null, "Java面向?qū)ο髲娜腴T到精通,簡(jiǎn)單上手"));
        list1.add(new Content(null, "Java面向編程", "15", null, "Java面向?qū)ο缶幊虝?));
        list1.add(new Content(null, "JavaScript入門", "18", null, "JavaScript入門編程書籍"));
        list1.add(new Content(null, "深入理解Java編程", "13", null, "十三四天掌握J(rèn)ava基礎(chǔ)"));
        list1.add(new Content(null, "從入門到放棄_Java", "20", null, "一門從入門到放棄的書籍"));
        list1.add(new Content(null, "Head First Java", "30", null, "《Head First Java》是一本完整地面向?qū)ο?object-oriented，OO)程序設(shè)計(jì)和Java的學(xué)習(xí)指導(dǎo)用書"));
        list1.add(new Content(null, "Java 核心技術(shù)：卷1 基礎(chǔ)知識(shí)", "22", null, "全書共14章，包括Java基本的程序結(jié)構(gòu)、對(duì)象與類、繼承、接口與內(nèi)部類、圖形程序設(shè)計(jì)、事件處理、Swing用戶界面組件"));
        list1.add(new Content(null, "Java 編程思想", "12", null, "本書贏得了全球程序員的廣泛贊譽(yù)，即使是最晦澀的概念，在Bruce Eckel的文字親和力和小而直接的編程示例面前也會(huì)化解于無形"));
        list1.add(new Content(null, "Java開發(fā)實(shí)戰(zhàn)經(jīng)典", "51", null, "本書是一本綜合講解Java核心技術(shù)的書籍，在書中使用大量的代碼及案例進(jìn)行知識(shí)點(diǎn)的分析與運(yùn)用"));
        list1.add(new Content(null, "Effective Java", "10", null, "本書介紹了在Java編程中57條極具實(shí)用價(jià)值的經(jīng)驗(yàn)規(guī)則，這些經(jīng)驗(yàn)規(guī)則涵蓋了大多數(shù)開發(fā)人員每天所面臨的問題的解決方案"));
        list1.add(new Content(null, "分布式 Java 應(yīng)用：基礎(chǔ)與實(shí)踐", "14", null, "本書介紹了編寫分布式Java應(yīng)用涉及的眾多知識(shí)點(diǎn)，分為了基于Java實(shí)現(xiàn)網(wǎng)絡(luò)通信、RPC;基于SOA實(shí)現(xiàn)大型分布式Java應(yīng)用"));
        list1.add(new Content(null, "http權(quán)威指南", "11", null, "超文本傳輸協(xié)議(Hypertext Transfer Protocol，HTTP)是在萬維網(wǎng)上進(jìn)行通信時(shí)所使用的協(xié)議方案"));
        list1.add(new Content(null, "Spring", "15", null, "這是啥，還需要學(xué)習(xí)嗎？Java程序員必備書籍"));
        list1.add(new Content(null, "深入理解 Java 虛擬機(jī)", "18", null, "作為一位Java程序員，你是否也曾經(jīng)想深入理解Java虛擬機(jī)，但是卻被它的復(fù)雜和深?yuàn)W拒之門外"));
        list1.add(new Content(null, "springboot實(shí)戰(zhàn)", "11", null, "完成對(duì)于springboot的理解，是每個(gè)Java程序員必備的姿勢(shì)"));
        list1.add(new Content(null, "springmvc學(xué)習(xí)", "72", null, "springmvc學(xué)習(xí)指南"));
        list1.add(new Content(null, "vue入門到放棄", "20", null, "vue入門到放棄書籍信息"));
        list1.add(new Content(null, "vue入門到精通", "20", null, "vue入門到精通相關(guān)書籍信息"));
        list1.add(new Content(null, "vue之旅", "20", null, "由淺入深地全面介紹vue技術(shù)，包含大量案例與代碼"));
        list1.add(new Content(null, "vue實(shí)戰(zhàn)", "20", null, "以實(shí)戰(zhàn)為導(dǎo)向，系統(tǒng)講解如何使用 "));
        list1.add(new Content(null, "vue入門與實(shí)踐", "20", null, "現(xiàn)已得到蘋果、微軟、谷歌等主流廠商全面支持"));
        list1.add(new Content(null, "Vue.js應(yīng)用測(cè)試", "20", null, "Vue.js創(chuàng)始人尤雨溪鼎力推薦！Vue官方測(cè)試工具作者親筆撰寫，Vue.js應(yīng)用測(cè)試完全學(xué)習(xí)指南"));
        list1.add(new Content(null, "PHP和MySQL Web開發(fā)", "20", null, "本書是利用PHP和MySQL構(gòu)建數(shù)據(jù)庫(kù)驅(qū)動(dòng)的Web應(yīng)用程序的權(quán)威指南"));
        list1.add(new Content(null, "Web高效編程與優(yōu)化實(shí)踐", "20", null, "從思想提升和內(nèi)容修煉兩個(gè)維度，圍繞前端工程師必備的前端技術(shù)和編程基礎(chǔ)"));
        list1.add(new Content(null, "Vue.js 2.x實(shí)踐指南", "20", null, "本書旨在讓初學(xué)者能夠快速上手vue技術(shù)棧，并能夠利用所學(xué)知識(shí)獨(dú)立動(dòng)手進(jìn)行項(xiàng)目開發(fā)"));
        list1.add(new Content(null, "初始vue", "20", null, "解開vue的面紗"));
        list1.add(new Content(null, "什么是vue", "20", null, "一步一步的了解vue相關(guān)信息"));
        list1.add(new Content(null, "深入淺出vue", "20", null, "深入淺出vue，慢慢掌握"));
        list1.add(new Content(null, "三天vue實(shí)戰(zhàn)", "20", null, "三天掌握vue開發(fā)"));
        list1.add(new Content(null, "不知火舞", "20", null, "不知名的vue"));
        list1.add(new Content(null, "娜可露露", "20", null, "一招秒人"));
        list1.add(new Content(null, "宮本武藏", "20", null, "我就是一個(gè)超級(jí)兵"));
        list1.add(new Content(null, "vue宮本vue", "20", null, "我就是一個(gè)超級(jí)兵"));
        // 創(chuàng)建文檔的集合
        Collection<Document> docs = new ArrayList<>();
        for (int i = 0; i < list1.size(); i++) {
            //contentMapper.insertSelective(list1.get(i));
            // 創(chuàng)建文檔對(duì)象
            Document document = new Document();
            //StringField會(huì)創(chuàng)建索引，但是不會(huì)被分詞，TextField，即創(chuàng)建索引又會(huì)被分詞。
            document.add(new StringField("id", (i + 1) + "", Field.Store.YES));
            document.add(new TextField("title", list1.get(i).getTitle(), Field.Store.YES));
            document.add(new TextField("price", list1.get(i).getPrice(), Field.Store.YES));
            document.add(new TextField("descs", list1.get(i).getDescs(), Field.Store.YES));
            docs.add(document);
        }
        // 索引目錄類,指定索引在硬盤中的位置，我的設(shè)置為D盤的indexDir文件夾
        Directory directory = FSDirectory.open(FileSystems.getDefault().getPath("D:\\Lucene\\indexDir"));
        // 引入IK分詞器
        Analyzer analyzer = new MyIKAnalyzer();
        // 索引寫出工具的配置對(duì)象，這個(gè)地方就是最上面報(bào)錯(cuò)的問題解決方案
        IndexWriterConfig conf = new IndexWriterConfig(analyzer);
        // 設(shè)置打開方式：OpenMode.APPEND 會(huì)在索引庫(kù)的基礎(chǔ)上追加新索引。OpenMode.CREATE會(huì)先清空原來數(shù)據(jù)，再提交新的索引
        conf.setOpenMode(IndexWriterConfig.OpenMode.CREATE);
        // 創(chuàng)建索引的寫出工具類。參數(shù)：索引的目錄和配置信息
        IndexWriter indexWriter = new IndexWriter(directory, conf);
        // 把文檔集合交給IndexWriter
        indexWriter.addDocuments(docs);
        // 提交
        indexWriter.commit();
        // 關(guān)閉
        indexWriter.close();
        return "success";
    }
    @RequestMapping("/updateIndex")
    public String update(String age) throws IOException {
        // 創(chuàng)建目錄對(duì)象
        Directory directory = FSDirectory.open(FileSystems.getDefault().getPath("D:\\Lucene\\indexDir"));
        // 創(chuàng)建配置對(duì)象
        IndexWriterConfig conf = new IndexWriterConfig(new MyIKAnalyzer());
        // 創(chuàng)建索引寫出工具
        IndexWriter writer = new IndexWriter(directory, conf);
        // 創(chuàng)建新的文檔數(shù)據(jù)
        Document doc = new Document();
        doc.add(new StringField("id", "34", Field.Store.YES));
        //Content content = contentMapper.selectByPrimaryKey("34");
        //content.setTitle("宮本武藏超級(jí)兵");
        //contentMapper.updateByPrimaryKeySelective(content);
        Content content = new Content(34, "宮本武藏超級(jí)兵", "", "", "");
        doc.add(new TextField("title", content.getTitle(), Field.Store.YES));
        doc.add(new TextField("price", content.getPrice(), Field.Store.YES));
        doc.add(new TextField("descs", content.getDescs(), Field.Store.YES));
        writer.updateDocument(new Term("id", "34"), doc);
        // 提交
        writer.commit();
        // 關(guān)閉
        writer.close();
        return "success";
    }
    @RequestMapping("/deleteIndex")
    public String deleteIndex() throws IOException {
        // 創(chuàng)建目錄對(duì)象
        Directory directory = FSDirectory.open(FileSystems.getDefault().getPath("D:\\Lucene\\indexDir"));
        // 創(chuàng)建配置對(duì)象
        IndexWriterConfig conf = new IndexWriterConfig(new IKAnalyzer());
        // 創(chuàng)建索引寫出工具
        IndexWriter writer = new IndexWriter(directory, conf);
        // 根據(jù)詞條進(jìn)行刪除
        writer.deleteDocuments(new Term("id", "34"));
        // 提交
        writer.commit();
        // 關(guān)閉
        writer.close();
        return "success";
    }
    @RequestMapping("/searchText")
    public Object searchText(String text, HttpServletRequest request) throws IOException, ParseException {
        Directory directory = FSDirectory.open(FileSystems.getDefault().getPath("D:\\Lucene\\indexDir"));
        // 索引讀取工具
        IndexReader reader = DirectoryReader.open(directory);
        // 索引搜索工具
        IndexSearcher searcher = new IndexSearcher(reader);
        // 創(chuàng)建查詢解析器,兩個(gè)參數(shù)：默認(rèn)要查詢的字段的名稱，分詞器
        QueryParser parser = new QueryParser("descs", new MyIKAnalyzer());
        // 創(chuàng)建查詢對(duì)象
        Query query = parser.parse(text);
        // 獲取前十條記錄
        TopDocs topDocs = searcher.search(query, 10);
        // 獲取總條數(shù)
        System.out.println("本次搜索共找到" + topDocs.totalHits + "條數(shù)據(jù)");
        // 獲取得分文檔對(duì)象（ScoreDoc）數(shù)組.SocreDoc中包含：文檔的編號(hào)、文檔的得分
        ScoreDoc[] scoreDocs = topDocs.scoreDocs;
        List<Content> list = new ArrayList<>();
        for (ScoreDoc scoreDoc : scoreDocs) {
            // 取出文檔編號(hào)
            int docID = scoreDoc.doc;
            // 根據(jù)編號(hào)去找文檔
            Document doc = reader.document(docID);
            //Content content = contentMapper.selectByPrimaryKey(doc.get("id"));
            Content content = new Content();
            content.setId(Integer.valueOf(doc.get("id")));
            content.setTitle(doc.get("title"));
            content.setDescs(doc.get("descs"));
            list.add(content);
        }
        return list;
    }
    @RequestMapping("/searchText1")
    public Object searchText1(String text, HttpServletRequest request) throws IOException, ParseException {
        String[] str = {"title", "descs"};
        Directory directory = FSDirectory.open(FileSystems.getDefault().getPath("d:\\indexDir"));
        // 索引讀取工具
        IndexReader reader = DirectoryReader.open(directory);
        // 索引搜索工具
        IndexSearcher searcher = new IndexSearcher(reader);
        // 創(chuàng)建查詢解析器,兩個(gè)參數(shù)：默認(rèn)要查詢的字段的名稱，分詞器
        MultiFieldQueryParser parser = new MultiFieldQueryParser(str, new MyIKAnalyzer());
        // 創(chuàng)建查詢對(duì)象
        Query query = parser.parse(text);
        // 獲取前十條記錄
        TopDocs topDocs = searcher.search(query, 100);
        // 獲取總條數(shù)
        System.out.println("本次搜索共找到" + topDocs.totalHits + "條數(shù)據(jù)");
        // 獲取得分文檔對(duì)象（ScoreDoc）數(shù)組.SocreDoc中包含：文檔的編號(hào)、文檔的得分
        ScoreDoc[] scoreDocs = topDocs.scoreDocs;
        List<Content> list = new ArrayList<>();
        for (ScoreDoc scoreDoc : scoreDocs) {
            // 取出文檔編號(hào)
            int docID = scoreDoc.doc;
            // 根據(jù)編號(hào)去找文檔
            Document doc = reader.document(docID);
            //Content content = contentMapper.selectByPrimaryKey(doc.get("id"));
            Content content = new Content();
            content.setId(Integer.valueOf(doc.get("id")));
            list.add(content);
        }
        return list;
    }
    @RequestMapping("/searchText2")
    public Object searchText2(String text, HttpServletRequest request) throws IOException, ParseException, InvalidTokenOffsetsException {
        String[] str = {"title", "descs"};
        Directory directory = FSDirectory.open(FileSystems.getDefault().getPath("d:\\indexDir"));
        // 索引讀取工具
        IndexReader reader = DirectoryReader.open(directory);
        // 索引搜索工具
        IndexSearcher searcher = new IndexSearcher(reader);
        // 創(chuàng)建查詢解析器,兩個(gè)參數(shù)：默認(rèn)要查詢的字段的名稱，分詞器
        MultiFieldQueryParser parser = new MultiFieldQueryParser(str, new MyIKAnalyzer());
        // 創(chuàng)建查詢對(duì)象
        Query query = parser.parse(text);
        // 獲取前十條記錄
        TopDocs topDocs = searcher.search(query, 100);
        // 獲取總條數(shù)
        System.out.println("本次搜索共找到" + topDocs.totalHits + "條數(shù)據(jù)");
        //高亮顯示
        SimpleHTMLFormatter simpleHTMLFormatter = new SimpleHTMLFormatter("<span style='color:red'>", "</span>");
        Highlighter highlighter = new Highlighter(simpleHTMLFormatter, new QueryScorer(query));
        Fragmenter fragmenter = new SimpleFragmenter(100);   //高亮后的段落范圍在100字內(nèi)
        highlighter.setTextFragmenter(fragmenter);
        // 獲取得分文檔對(duì)象（ScoreDoc）數(shù)組.SocreDoc中包含：文檔的編號(hào)、文檔的得分
        ScoreDoc[] scoreDocs = topDocs.scoreDocs;
        List<Content> list = new ArrayList<>();
        for (ScoreDoc scoreDoc : scoreDocs) {
            // 取出文檔編號(hào)
            int docID = scoreDoc.doc;
            // 根據(jù)編號(hào)去找文檔
            Document doc = reader.document(docID);
            //Content content = contentMapper.selectByPrimaryKey(doc.get("id"));
            Content content = new Content();
            //處理高亮字段顯示
            String title = highlighter.getBestFragment(new MyIKAnalyzer(), "title", doc.get("title"));
            if (title == null) {
                title = content.getTitle();
            }
            String descs = highlighter.getBestFragment(new MyIKAnalyzer(), "descs", doc.get("descs"));
            if (descs == null) {
                descs = content.getDescs();
            }
            content.setDescs(descs);
            content.setTitle(title);
            list.add(content);
        }
        request.setAttribute("list", list);
        return "index";
    }
    @RequestMapping("/searchText3")
    public String searchText3(String text, HttpServletRequest request) throws IOException, ParseException, InvalidTokenOffsetsException {
        String[] str = {"title", "descs"};
        int page = 1;
        int pageSize = 10;
        Directory directory = FSDirectory.open(FileSystems.getDefault().getPath("d:\\indexDir"));
        // 索引讀取工具
        IndexReader reader = DirectoryReader.open(directory);
        // 索引搜索工具
        IndexSearcher searcher = new IndexSearcher(reader);
        // 創(chuàng)建查詢解析器,兩個(gè)參數(shù)：默認(rèn)要查詢的字段的名稱，分詞器
        MultiFieldQueryParser parser = new MultiFieldQueryParser(str, new MyIKAnalyzer());
        // 創(chuàng)建查詢對(duì)象
        Query query = parser.parse(text);
        // 獲取前十條記錄
        //TopDocs topDocs = searcher.search(query, 100);
        TopDocs topDocs = searchByPage(page, pageSize, searcher, query);
        // 獲取總條數(shù)
        System.out.println("本次搜索共找到" + topDocs.totalHits + "條數(shù)據(jù)");
        //高亮顯示
        SimpleHTMLFormatter simpleHTMLFormatter = new SimpleHTMLFormatter("<span style='color:red'>", "</span>");
        Highlighter highlighter = new Highlighter(simpleHTMLFormatter, new QueryScorer(query));
        Fragmenter fragmenter = new SimpleFragmenter(100);   //高亮后的段落范圍在100字內(nèi)
        highlighter.setTextFragmenter(fragmenter);
        // 獲取得分文檔對(duì)象（ScoreDoc）數(shù)組.SocreDoc中包含：文檔的編號(hào)、文檔的得分
        ScoreDoc[] scoreDocs = topDocs.scoreDocs;
        List<Content> list = new ArrayList<>();
        for (ScoreDoc scoreDoc : scoreDocs) {
            // 取出文檔編號(hào)
            int docID = scoreDoc.doc;
            // 根據(jù)編號(hào)去找文檔
            Document doc = reader.document(docID);
            //Content content = contentMapper.selectByPrimaryKey(doc.get("id"));
            Content content = new Content();
            //處理高亮字段顯示
            String title = highlighter.getBestFragment(new MyIKAnalyzer(), "title", doc.get("title"));
            if (title == null) {
                title = content.getTitle();
            }
            String descs = highlighter.getBestFragment(new MyIKAnalyzer(), "descs", doc.get("descs"));
            if (descs == null) {
                descs = content.getDescs();
            }
            content.setDescs(descs);
            content.setTitle(title);
            list.add(content);
        }
        System.err.println("list的長(zhǎng)度：" + list.size());
        request.setAttribute("page", page);
        request.setAttribute("pageSize", pageSize);
        request.setAttribute("list", list);
        return "index";
    }
    private TopDocs searchByPage(int page, int perPage, IndexSearcher searcher, Query query) throws IOException {
        TopDocs result = null;
        if (query == null) {
            System.out.println(" Query is null return null ");
            return null;
        }
        ScoreDoc before = null;
        if (page != 1) {
            TopDocs docsBefore = searcher.search(query, (page - 1) * perPage);
            ScoreDoc[] scoreDocs = docsBefore.scoreDocs;
            if (scoreDocs.length > 0) {
                before = scoreDocs[scoreDocs.length - 1];
            }
        }
        result = searcher.searchAfter(before, query, perPage);
        return result;
    }
    @RequestMapping("/searchText4")
    public String searchText4(String text, HttpServletRequest request) throws IOException, ParseException, InvalidTokenOffsetsException {
        String[] str = {"title", "descs"};
        int page = 1;
        int pageSize = 100;
        IndexSearcher searcher = getMoreSearch("d:\\indexDir");
        // 創(chuàng)建查詢解析器,兩個(gè)參數(shù)：默認(rèn)要查詢的字段的名稱，分詞器
        MultiFieldQueryParser parser = new MultiFieldQueryParser(str, new MyIKAnalyzer());
        // 創(chuàng)建查詢對(duì)象
        Query query = parser.parse(text);
        TopDocs topDocs = searchByPage(page, pageSize, searcher, query);
        // 獲取總條數(shù)
        System.out.println("本次搜索共找到" + topDocs.totalHits + "條數(shù)據(jù)");
        //高亮顯示
        SimpleHTMLFormatter simpleHTMLFormatter = new SimpleHTMLFormatter("<span style='color:red'>", "</span>");
        Highlighter highlighter = new Highlighter(simpleHTMLFormatter, new QueryScorer(query));
        Fragmenter fragmenter = new SimpleFragmenter(100);   //高亮后的段落范圍在100字內(nèi)
        highlighter.setTextFragmenter(fragmenter);
        // 獲取得分文檔對(duì)象（ScoreDoc）數(shù)組.SocreDoc中包含：文檔的編號(hào)、文檔的得分
        ScoreDoc[] scoreDocs = topDocs.scoreDocs;
        List<Content> list = new ArrayList<>();
        for (ScoreDoc scoreDoc : scoreDocs) {
            // 取出文檔編號(hào)
            int docID = scoreDoc.doc;
            // 根據(jù)編號(hào)去找文檔
            //Document doc = reader.document(docID);
            Document doc = searcher.doc(docID);//多索引找文檔要用searcher找了，reader容易報(bào)錯(cuò)
            //Content content = contentMapper.selectByPrimaryKey(doc.get("id"));
            Content content = new Content();
            //處理高亮字段顯示
            String title = highlighter.getBestFragment(new MyIKAnalyzer(), "title", doc.get("title"));
            if (title == null) {
                title = content.getTitle();
            }
            String descs = highlighter.getBestFragment(new MyIKAnalyzer(), "descs", doc.get("descs"));
            if (descs == null) {
                descs = content.getDescs();
            }
            content.setDescs(descs);
            content.setTitle(title);
            list.add(content);
        }
        System.err.println("list的長(zhǎng)度：" + list.size());
        request.setAttribute("page", page);
        request.setAttribute("pageSize", pageSize);
        request.setAttribute("list", list);
        return "index";
    }
    private IndexSearcher getMoreSearch(String string) {
        MultiReader reader = null;
        //設(shè)置
        try {
            File[] files = new File(string).listFiles();
            IndexReader[] readers = new IndexReader[files.length];
            for (int i = 0; i < files.length; i++) {
                readers[i] = DirectoryReader.open(FSDirectory.open(Paths.get(files[i].getPath(), new String[0])));
            }
            reader = new MultiReader(readers);
        } catch (IOException e) {
            e.printStackTrace();
        }
        return new IndexSearcher(reader);
        //如果索引文件過多，可以這樣加快效率
        /**
         ExecutorService service = Executors.newCachedThreadPool();
         return new IndexSearcher(reader,service);
         */
    }

測(cè)試2

public static void main(String[] args) throws IOException, ParseException {
        long startTime = System.currentTimeMillis();
        // indexDir is the directory that hosts Lucene's index files
        File indexDir = new File("D:\\Lucene\\indexDir");
        // dataDir is the directory that hosts the text files that to be indexed
        File dataDir = new File("D:\\Lucene\\dataDir");
        Analyzer luceneAnalyzer = new StandardAnalyzer();
        // 或引入IK分詞器
        Analyzer IkAnalyzer = new MyIKAnalyzer();
        Directory directory = FSDirectory.open(FileSystems.getDefault().getPath("D:\\Lucene\\indexDir"));
        IndexWriterConfig indexWriterConfig = new IndexWriterConfig(IkAnalyzer);
        // 設(shè)置打開方式：OpenMode.APPEND 會(huì)在索引庫(kù)的基礎(chǔ)上追加新索引、OpenMode.CREATE會(huì)先清空原來數(shù)據(jù)，再提交新的索引
        indexWriterConfig.setOpenMode(IndexWriterConfig.OpenMode.CREATE);
        IndexWriter indexWriter = new IndexWriter(directory, indexWriterConfig);
        File[] dataFiles = dataDir.listFiles();
        for (int i = 0; i < dataFiles.length; i++) {
            if (dataFiles[i].isFile() && dataFiles[i].getName().endsWith(".txt")) {
                System.out.println("Indexing file " + dataFiles[i].getCanonicalPath());
                Document document = new Document();
                Reader txtReader = new FileReader(dataFiles[i]);
                document.add(new TextField("path", dataFiles[i].getCanonicalPath(), Field.Store.YES));
                document.add(new TextField("contents", txtReader));
                indexWriter.addDocument(document);
            }
        }
        indexWriter.commit();
        indexWriter.close();
        long endTime = System.currentTimeMillis();
        System.out.println("It takes "
                + (endTime - startTime)
                + " milliseconds to create index for the files in directory "
                + dataDir.getPath());
        String queryStr = "hello";
        // 索引讀取工具
        IndexReader indexReader = DirectoryReader.open(directory);
        // 索引搜索工具
        IndexSearcher indexSearcher = new IndexSearcher(indexReader);
        // 創(chuàng)建查詢解析器,兩個(gè)參數(shù)：默認(rèn)要查詢的字段的名稱，分詞器
        QueryParser parser = new QueryParser("contents", IkAnalyzer);
        // 創(chuàng)建查詢對(duì)象
        Query query = parser.parse(queryStr);
        // 獲取前十條記錄
        TopDocs topDocs = indexSearcher.search(query, 10);
        // 獲取總條數(shù)
        System.out.println("本次搜索共找到" + topDocs.totalHits + "條數(shù)據(jù)");
        // 獲取得分文檔對(duì)象（ScoreDoc）數(shù)組.SocreDoc中包含：文檔的編號(hào)、文檔的得分
        ScoreDoc[] scoreDocs = topDocs.scoreDocs;
        for (ScoreDoc scoreDoc : scoreDocs) {
            // 取出文檔編號(hào)
            int docID = scoreDoc.doc;
            // 根據(jù)編號(hào)去找文檔
            Document doc = indexReader.document(docID);
            System.out.println(doc.get("path"));
        }
    }

IndexWriter對(duì)象將dataDir下的所有txt文件建立索引，指定索引文件的目錄為indexDir，Document對(duì)象對(duì)應(yīng)一個(gè)帶搜索的文件，可以是文本文件也可以是一個(gè)網(wǎng)頁(yè)，為Document對(duì)象指定field，這里為文本文件定義了兩個(gè)field：path和contents，運(yùn)行完第一部分代碼后，則在指定目錄下生成了索引文件，如下

IndexReader對(duì)象讀取索引文件，通過QueryParser對(duì)象指定語(yǔ)法分析器和對(duì)document的那個(gè)字段進(jìn)行查詢，Query對(duì)象則制定了搜索的關(guān)鍵字，通過IndexSearcher對(duì)象實(shí)現(xiàn)檢索，并返回結(jié)果集TopDocs，運(yùn)行完第二部分代碼后，會(huì)看到打印包含關(guān)鍵字的文本文件的路徑，如下

到此這篇關(guān)于Springboot通過lucene實(shí)現(xiàn)全文檢索詳解流程的文章就介紹到這了,更多相關(guān)Springboot lucene內(nèi)容請(qǐng)搜索腳本之家以前的文章或繼續(xù)瀏覽下面的相關(guān)文章希望大家以后多多支持腳本之家！

您可能感興趣的文章: