查找 Elasticsearch 到 python 中字段的所有唯一值

Question

我一直在 web 中搜索 Elasticsearch 的一些好的 python 文檔。我有一個查詢詞，我知道它會返回我需要的信息，但我正在努力將原始字符串轉換為 Python 可以解釋的內容。

這將返回數據集中所有唯一“VALUE”的列表。

{"find": "terms", "field": "hierarchy1.hierarchy2.VALUE"}

這是我從訪問此數據的儀表板工具中獲取的。 但我似乎無法將其轉換為正確的 python。

我試過這個：

body_test = {"find": "terms", "field": "hierarchy1.hierarchy2.VALUE"}
es = Elasticsearch(SETUP CONNECTION)
es.search(
    index="INDEX_NAME",
    body = body_test
)

但它不喜歡find值。 我在文檔中找不到任何關於find的內容。

RequestError: RequestError(400, 'parsing_exception', '[find] 中 VALUE_STRING 的未知鍵。')

我讓它稍微工作的唯一方法是

es_search = (
        Search(
            using=es,
            index=db_index
        ).source(['hierarchy1.hierarchy2.VALUE'])
    )

但我認為這是拉取整個數據集然后進行過濾（我顯然不想每次運行此代碼時都這樣做）。 這需要通過 python 完成，因此我不能簡單地發布我知道有效的查詢。

我對 ES 完全陌生，所以這有點令人困惑。 提前致謝！

Answer 1

所以事實證明，在這種情況下的find是特定於 Grafana（我從中獲取查詢的儀表板工具。最后我使用了這個網站並使用了那里的代碼。它比我想象的要復雜得多是。但它工作得非常快，並且不會對數據庫造成壓力（我的替代方法正在這樣做）。

萬一鏈接在未來幾年失效，這里是我使用的代碼：

from elasticsearch import Elasticsearch

es = Elasticsearch()

def iterate_distinct_field(es, fieldname, pagesize=250, **kwargs):
    """
    Helper to get all distinct values from ElasticSearch
    (ordered by number of occurrences)
    """
    compositeQuery = {
        "size": pagesize,
        "sources": [{
                fieldname: {
                    "terms": {
                        "field": fieldname
                    }
                }
            }
        ]
    }
    # Iterate over pages
    while True:
        result = es.search(**kwargs, body={
            "aggs": {
                "values": {
                    "composite": compositeQuery
                }
            }
        })
        # Yield each bucket
        for aggregation in result["aggregations"]["values"]["buckets"]:
            yield aggregation
        # Set "after" field
        if "after_key" in result["aggregations"]["values"]:
            compositeQuery["after"] = \
                result["aggregations"]["values"]["after_key"]
        else: # Finished!
            break

# Usage example
for result in iterate_distinct_field(es, fieldname="pattern.keyword", index="strings"):
    print(result) # e.g. {'key': {'pattern': 'mypattern'}, 'doc_count': 315}

查找 Elasticsearch 到 python 中字段的所有唯一值

問題描述

1 個解決方案

解決方案1
0 2022-02-09 20:03:25

查找 Elasticsearch 到 python 中字段的所有唯一值

問題描述

1 個解決方案

解決方案1 0 2022-02-09 20:03:25

解決方案1
0 2022-02-09 20:03:25