ElasticSearch Python 使用示例詳解

更新時間：2025年04月14日 11:55:43 作者：TMesh

這篇文章主要介紹了ElasticSearch Python 使用示例詳解,本文通過實例代碼給大家介紹的非常詳細,對大家的學(xué)習(xí)或工作具有一定的參考借鑒價值,需要的朋友參考下吧

依賴下載

pip install elasticsearch
# 豆瓣源
pip install -i https://pypi.doubanio.com/simple/ elasticsearch

連接elasticsearch

連接 elasticsearch 有以下幾種連接方式：

from elasticsearch import  Elasticsearch
# es = Elasticsearch()    # 默認連接本地 elasticsearch
# es = Elasticsearch(['127.0.0.1:9200'])  # 連接本地 9200 端口
es = Elasticsearch(
    ["192.168.1.10", "192.168.1.11", "192.168.1.12"], # 連接集群，以列表的形式存放各節(jié)點的IP地址
    sniff_on_start=True,    # 連接前測試
    sniff_on_connection_fail=True,  # 節(jié)點無響應(yīng)時刷新節(jié)點
    sniff_timeout=60    # 設(shè)置超時時間
)

配置忽略響應(yīng)狀態(tài)碼

es = Elasticsearch(['127.0.0.1:9200'],ignore=400)  # 忽略返回的 400 狀態(tài)碼
es = Elasticsearch(['127.0.0.1:9200'],ignore=[400, 405, 502])  # 以列表的形式忽略多個狀態(tài)碼

示例

from elasticsearch import  Elasticsearch
es = Elasticsearch()    # 默認連接本地 elasticsearch
print(es.index(index='py2', doc_type='doc', id=1, body={'name': "張開", "age": 18}))
print(es.get(index='py2', doc_type='doc', id=1))

第 1 個 print 為創(chuàng)建 py2 索引，并插入一條數(shù)據(jù)，第2個 print 查詢指定文檔。
查詢結(jié)果如下：

from elasticsearch import Elasticsearch
es = Elasticsearch() # 默認連接本地 elasticsearch
print(es.index(index='py2', doc_type='doc', id=1, body={'name': "張開", "age": 18}))
print(es.get(index='py2', doc_type='doc', id=1))

Elasticsearch for Python之操作

Python 中關(guān)于 elasticsearch 的操作，主要集中一下幾個方面：

結(jié)果過濾，對于返回結(jié)果做過濾，主要是優(yōu)化返回內(nèi)容。
ElasticSearch（簡稱 es），直接操作 elasticsearch 對象，處理一些簡單的索引信息。一下幾個方面都是建立在 es 對象的基礎(chǔ)上。
Indices，關(guān)于索引的細節(jié)操作，比如創(chuàng)建自定義的 mappings。
Cluster，關(guān)于集群的相關(guān)操作。
Nodes，關(guān)于節(jié)點的相關(guān)操作。
Cat API，換一種查詢方式，一般的返回都是 json 類型的，cat 提供了簡潔的返回結(jié)果。
Snapshot，快照相關(guān)，快照是從正在運行的 Elasticsearch 集群中獲取的備份。我們可以拍攝單個索引或整個群集的快照，并將其存儲在共享文件系統(tǒng)的存儲庫中，并且有一些插件支持S3，HDFS，Azure，Google 云存儲等上的遠程存儲庫。
Task Management API，任務(wù)管理 API 是新的，仍應(yīng)被視為測試版功能。API 可能以不向后兼容的方式更改。

結(jié)果過濾

print(es.search(index='py2', filter_path=['hits.total', 'hits.hits._source']))    # 可以省略 type 類型
print(es.search(index='w2', doc_type='doc'))        # 可以指定 type 類型
print(es.search(index='w2', doc_type='doc', filter_path=['hits.total']))

filter_path參數(shù)用于減少elasticsearch返回的響應(yīng)，比如僅返回hits.total和hits.hits._source內(nèi)容。
除此之外，filter_path參數(shù)還支持*通配符以匹配字段名稱、任何字段或者字段部分：

print(es.search(index='py2', filter_path=['hits.*']))
print(es.search(index='py2', filter_path=['hits.hits._*']))
print(es.search(index='py2', filter_path=['hits.to*']))  # 僅返回響應(yīng)數(shù)據(jù)的 total
print(es.search(index='w2', doc_type='doc', filter_path=['hits.hits._*']))        # 可以加上可選的 type 類型

ElasticSearch（es 對象）

es.index，向指定索引添加或更新文檔，如果索引不存在，首先會創(chuàng)建該索引，然后再執(zhí)行添加或者更新操作。

# print(es.index(index='w2', doc_type='doc', id='4', body={"name":"可可", "age": 18}))    # 正常
# print(es.index(index='w2', doc_type='doc', id=5, body={"name":"卡卡西", "age":22}))     # 正常
# print(es.index(index='w2', id=6, body={"name": "鳴人", "age": 22}))  # 會報錯，TypeError: index() missing 1 required positional argument: 'doc_type'
print(es.index(index='w2', doc_type='doc', body={"name": "鳴人", "age": 22}))  # 可以不指定id，默認生成一個id

es.get，查詢索引中指定文檔。

print(es.get(index='w2', doc_type='doc', id=5))  # 正常
print(es.get(index='w2', doc_type='doc'))  # TypeError: get() missing 1 required positional argument: 'id'
print(es.get(index='w2',  id=5))  # TypeError: get() missing 1 required positional argument: 'doc_type'

es.search，執(zhí)行搜索查詢并獲取與查詢匹配的搜索匹配。 這個用的最多，可以跟復(fù)雜的查詢條件。

index要搜索的以逗號分隔的索引名稱列表; 使用 _all 或空字符串對所有索引執(zhí)行操作。
doc_type 要搜索的以逗號分隔的文檔類型列表; 留空以對所有類型執(zhí)行操作。
body 使用 Query DSL（QueryDomain Specific Language 查詢表達式）的搜索定義。
_source 返回_source字段的 true 或 false，或返回的字段列表，返回指定字段。
_source_exclude要從返回的_source字段中排除的字段列表，返回的所有字段中，排除哪些字段。
_source_include從_source字段中提取和返回的字段列表，跟_source差不多。

print(es.search(index='py3', doc_type='doc', body={"query": {"match":{"age": 20}}}))  # 一般查詢
print(es.search(index='py3', doc_type='doc', body={"query": {"match":{"age": 19}}},_source=['name', 'age']))  # 結(jié)果字段過濾
print(es.search(index='py3', doc_type='doc', body={"query": {"match":{"age": 19}}},_source_exclude  =[ 'age']))
print(es.search(index='py3', doc_type='doc', body={"query": {"match":{"age": 19}}},_source_include =[ 'age']))

es.get_source，通過索引、類型和 ID 獲取文檔的來源，其實，直接返回想要的字典。

print(es.get_source(index='py3', doc_type='doc', id='1'))  # {'name': '王五', 'age': 19}

es.count，執(zhí)行查詢并獲取該查詢的匹配數(shù)。比如查詢年齡是18的文檔。

body = {
    "query": {
        "match": {
            "age": 18
        }
    }
}
print(es.count(index='py2', doc_type='doc', body=body))  # {'count': 1, '_shards': {'total': 5, 'successful': 5, 'skipped': 0, 'failed': 0}}
print(es.count(index='py2', doc_type='doc', body=body)['count'])  # 1
print(es.count(index='w2'))  # {'count': 6, '_shards': {'total': 5, 'successful': 5, 'skipped': 0, 'failed': 0}}
print(es.count(index='w2', doc_type='doc'))  # {'count': 6, '_shards': {'total': 5, 'successful': 5, 'skipped': 0, 'failed': 0}}

es.create，創(chuàng)建索引（索引不存在的話）并新增一條數(shù)據(jù)，索引存在僅新增（只能新增，重復(fù)執(zhí)行會報錯）。

print(es.create(index='py3', doc_type='doc', id='1', body={"name": '王五', "age": 20}))
print(es.get(index='py3', doc_type='doc', id='3'))

在內(nèi)部，調(diào)用了 index，等價于：

print(es.index(index='py3', doc_type='doc', id='4', body={"name": "麻子", "age": 21}))

但個人覺得沒有 index 好用！

es.delete，刪除指定的文檔。比如刪除文章 id 為4的文檔，但不能刪除索引，如果想要刪除索引，還需要 es.indices.delete 來處理

print(es.index(index='py3', doc_type='doc', id='4', body={"name": "麻子", "age": 21}))

es.delete_by_query，刪除與查詢匹配的所有文檔。

index 要搜索的以逗號分隔的索引名稱列表; 使用 _all 或空字符串對所有索引執(zhí)行操作。
doc_type 要搜索的以逗號分隔的文檔類型列表; 留空以對所有類型執(zhí)行操作。
body使用 Query DSL 的搜索定義。

print(es.delete_by_query(index='py3', doc_type='doc', body={"query": {"match":{"age": 20}}}))

es.exists，查詢 elasticsearch 中是否存在指定的文檔，返回一個布爾值。

print(es.exists(index='py3', doc_type='doc', id='1'))

es.info，獲取當(dāng)前集群的基本信息。

print(es.info())

es.ping，如果群集已啟動，則返回 True，否則返回 False。

print(es.ping())

Indices（es.indices）

es.indices.create，在Elasticsearch中創(chuàng)建索引，用的最多。 比如創(chuàng)建一個嚴(yán)格模式、有4個字段、并為title字段指定ik_max_word查詢粒度的mappings。并應(yīng)用到py4索引中。這也是常用的創(chuàng)建自定義索引的方式。

body = {
    "mappings": {
        "doc": {
            "dynamic": "strict",
            "properties": {
                "title": {
                    "type": "text",
                    "analyzer": "ik_max_word"
                },
                "url": {
                    "type": "text"
                },
                "action_type": {
                    "type": "text"
                },
                "content": {
                    "type": "text"
                }
            }
        }
    }
}
es.indices.create('py4', body=body)

es.indices.analyze，返回分詞結(jié)果。

es.indices.analyze(body={'analyzer': "ik_max_word", "text": "皮特和茱麗當(dāng)選“年度模范情侶”Brad Pitt and Angelina Jolie"})

es.indices.delete，在 Elasticsearch 中刪除索引。

print(es.indices.delete(index='py4'))
print(es.indices.delete(index='w3'))    # {'acknowledged': True}

es.indices.put_alias，為一個或多個索引創(chuàng)建別名，查詢多個索引的時候，可以使用這個別名。

index 別名應(yīng)指向的逗號分隔的索引名稱列表（支持通配符），使用 _all對所有索引執(zhí)行操作。
name要創(chuàng)建或更新的別名的名稱。
body別名的設(shè)置，例如路由或過濾器。

print(es.indices.put_alias(index='py4', name='py4_alias'))  # 為單個索引創(chuàng)建別名
print(es.indices.put_alias(index=['py3', 'py2'], name='py23_alias'))  # 為多個索引創(chuàng)建同一個別名，聯(lián)查用

es.indices.delete_alias，刪除一個或多個別名。

print(es.indices.delete_alias(index='alias1'))
print(es.indices.delete_alias(index=['alias1, alias2']))

es.indices.get_mapping，檢索索引或索引/類型的映射定義。

print(es.indices.get_mapping(index='py4'))

es.indices.get_settings，檢索一個或多個（或所有）索引的設(shè)置。

print(es.indices.get_settings(index='py4'))

es.indices.get，允許檢索有關(guān)一個或多個索引的信息。

print(es.indices.get(index='py2'))    # 查詢指定索引是否存在
print(es.indices.get(index=['py2', 'py3']))

es.indices.get_alias，檢索一個或多個別名。

print(es.indices.get_alias(index='py2'))
print(es.indices.get_alias(index=['py2', 'py3']))

es.indices.get_field_mapping，檢索特定字段的映射信息。

print(es.indices.get_field_mapping(fields='url', index='py4', doc_type='doc'))
print(es.indices.get_field_mapping(fields=['url', 'title'], index='py4', doc_type='doc'))

es.indices.delete_alias，刪除特定別名。
es.indices.exists，返回一個布爾值，指示給定的索引是否存在。
es.indices.exists_type，檢查索引/索引中是否存在類型/類型。
es.indices.flus，明確的刷新一個或多個索引。
es.indices.get_field_mapping，檢索特定字段的映射。
es.indices.get_template，按名稱檢索索引模板。
es.indices.open，打開一個封閉的索引以使其可用于搜索。
es.indices.close，關(guān)閉索引以從群集中刪除它的開銷。封閉索引被阻止進行讀/寫操作。
es.indices.clear_cache，清除與一個或多個索引關(guān)聯(lián)的所有緩存或特定緩存。
es.indices.put_alias，為特定索引/索引創(chuàng)建別名。
es.indices.get_uprade，監(jiān)控一個或多個索引的升級程度。
es.indices.put_mapping，注冊特定類型的特定映射定義。
es.indices.put_settings，實時更改特定索引級別設(shè)置。
es.indices.put_template，創(chuàng)建一個索引模板，該模板將自動應(yīng)用于創(chuàng)建的新索引。
es.indices.rollove，當(dāng)現(xiàn)有索引被認為太大或太舊時，翻轉(zhuǎn)索引 API 將別名轉(zhuǎn)移到新索引。 API接受單個別名和條件列表。別名必須僅指向單個索引。如果索引滿足指定條件，則創(chuàng)建新索引并切換別名以指向新別名。
es.indices.segments，提供構(gòu)建 Lucene 索引（分片級別）的低級別段信息。

Cluster（集群相關(guān)）

es.cluster.get_settigns，獲取集群設(shè)置。

print(es.cluster.get_settings())

es.cluster.health，獲取有關(guān)群集運行狀況的非常簡單的狀態(tài)。

print(es.cluster.health())

es.cluster.state，獲取整個集群的綜合狀態(tài)信息。

print(es.cluster.state())

es.cluster.stats，返回群集的當(dāng)前節(jié)點的信息。

print(es.cluster.stats())

Node（節(jié)點相關(guān)）

es.nodes.info，返回集群中節(jié)點的信息。

print(es.nodes.info())  # 返回所節(jié)點
print(es.nodes.info(node_id='node1'))   # 指定一個節(jié)點
print(es.nodes.info(node_id=['node1', 'node2']))   # 指定多個節(jié)點列表

es.nodes.stats，獲取集群中節(jié)點統(tǒng)計信息。

print(es.nodes.stats())
print(es.nodes.stats(node_id='node1'))
print(es.nodes.stats(node_id=['node1', 'node2']))

es.nodes.hot_threads，獲取指定節(jié)點的線程信息。

print(es.nodes.stats())
print(es.nodes.stats(node_id='node1'))
print(es.nodes.stats(node_id=['node1', 'node2']))

es.nodes.usage，獲取集群中節(jié)點的功能使用信息。

print(es.nodes.usage())
print(es.nodes.usage(node_id='node1'))
print(es.nodes.usage(node_id=['node1', 'node2']))

Cat（一種查詢方式）

es.cat.aliases，返回別名信息。
- name要返回的以逗號分隔的別名列表。
- formatAccept 標(biāo)頭的簡短版本，例如 json，yaml

print(es.cat.aliases(name='py23_alias'))
print(es.cat.aliases(name='py23_alias', format='json'))

es.cat.allocation，返回分片使用情況。

print(es.cat.allocation())
print(es.cat.allocation(node_id=['node1']))
print(es.cat.allocation(node_id=['node1', 'node2'], format='json'))

es.cat.count，Count 提供對整個群集或單個索引的文檔計數(shù)的快速訪問。

print(es.cat.allocation())
print(es.cat.allocation(node_id=['node1']))
print(es.cat.allocation(node_id=['node1', 'node2'], format='json'))

es.cat.fielddata，基于每個節(jié)點顯示有關(guān)當(dāng)前加載的 fielddata 的信息。 有些數(shù)據(jù)為了查詢效率，會放在內(nèi)存中，fielddata 用來控制哪些數(shù)據(jù)應(yīng)該被放在內(nèi)存中，而這個es.cat.fielddata則查詢現(xiàn)在哪些數(shù)據(jù)在內(nèi)存中，數(shù)據(jù)大小等信息。

print(es.cat.fielddata())
print(es.cat.fielddata(format='json', bytes='b'))

bytes顯示字節(jié)值的單位，有效選項為：'b'，'k'，'kb'，'m'，'mb'，'g'，'gb'，'t'，'tb' ，'p'，'pb'
formatAccept 標(biāo)頭的簡短版本，例如 json，yaml

es.cat.health，從集群中health里面過濾出簡潔的集群健康信息。

print(es.cat.health())
print(es.cat.health(format='json'))

**es.cat.help，返回es.cat的幫助信息。

print(es.cat.help())

es.cat.indices，返回索引的信息；也可以使用此命令進行查詢集群中有多少索引。

print(es.cat.indices())
print(es.cat.indices(index='py3'))
print(es.cat.indices(index='py3', format='json'))
print(len(es.cat.indices(format='json')))  # 查詢集群中有多少索引

es.cat.master，返回集群中主節(jié)點的 IP，綁定 IP 和節(jié)點名稱。

print(es.cat.master())
print(es.cat.master(format='json'))

es.cat.nodeattrs，返回節(jié)點的自定義屬性。

print(es.cat.nodeattrs())
print(es.cat.nodeattrs(format='json'))

es.cat.nodes，返回節(jié)點的拓撲，這些信息在查看整個集群時通常很有用，特別是大型集群。 我有多少符合條件的節(jié)點?

print(es.cat.nodeattrs())
print(es.cat.nodeattrs(format='json'))

es.cat.plugins，返回節(jié)點的插件信息。

print(es.cat.nodeattrs())
print(es.cat.nodeattrs(format='json'))

es.cat.segments，返回每個索引的 Lucene 有關(guān)的信息。

print(es.cat.segments())
print(es.cat.segments(index='py3'))
print(es.cat.segments(index='py3', format='json'))

es.cat.shards，返回哪個節(jié)點包含哪些分片的信息。

print(es.cat.shards())
print(es.cat.shards(index='py3'))
print(es.cat.shards(index='py3', format='json'))

es.cat.thread_pool，獲取有關(guān)線程池的信息。

print(es.cat.thread_pool())

Snapshot（快照相關(guān)）

es.snapshot.create，在存儲庫中創(chuàng)建快照。
- repository 存儲庫名稱。
- snapshot快照名稱。
- body快照定義。
es.snapshot.delete，從存儲庫中刪除快照。
es.snapshot.create_repository。注冊共享文件系統(tǒng)存儲庫。
es.snapshot.delete_repository，刪除共享文件系統(tǒng)存儲庫。
es.snapshot.get，檢索有關(guān)快照的信息。
es.snapshot.get_repository，返回有關(guān)已注冊存儲庫的信息。
es.snapshot.restore，恢復(fù)快照。
es.snapshot.status，返回有關(guān)所有當(dāng)前運行快照的信息。通過指定存儲庫名稱，可以將結(jié)果限制為特定存儲庫。
es.snapshot.verify_repository，返回成功驗證存儲庫的節(jié)點列表，如果驗證過程失敗，則返回錯誤消息。