使用Elasticsearch查詢字段的所有唯一值

Question

如何使用Elasticsearch搜索給定字段的所有唯一值？

我有類似於select full_name from authors這樣的查詢，所以我可以在表單上向用戶顯示列表。

Answer 1

您可以在“full_name”字段中創建術語構面。 但是為了正確地執行此操作，您需要確保在索引時不對其進行標記，否則構面中的每個條目都將是字段內容的一部分。 您很可能需要在映射中將其配置為“not_analyzed”。 如果您還在搜索它並且仍想要對其進行標記，則可以使用多字段以兩種不同的方式對其進行索引。

您還需要考慮到，取決於作為full_name字段一部分的唯一術語的數量，此操作可能很昂貴並且需要相當多的內存。

Answer 2

對於Elasticsearch 1.0及更高版本，您可以利用terms aggregation來執行此操作，

查詢DSL：

{
  "aggs": {
    "NAME": {
      "terms": {
        "field": "",
        "size": 10
      }
    }
  }
}

一個真實的例子：

{
  "aggs": {
    "full_name": {
      "terms": {
        "field": "authors",
        "size": 0
      }
    }
  }
}

然后，您可以獲得authors字段的所有唯一值。 size = 0表示不限制術語數（這要求es為1.1.0或更高版本）。

響應：

{
    ...

    "aggregations" : {
        "full_name" : {
            "buckets" : [
                {
                    "key" : "Ken",
                    "doc_count" : 10
                },
                {
                    "key" : "Jim Gray",
                    "doc_count" : 10
                },
            ]
        }
    }
}

請參閱Elasticsearch術語聚合。

Answer 3

由於以下原因，現有答案在Elasticsearch 5.X中對我不起作用：

索引時我需要對輸入進行標記。
"size": 0無法解析，因為“[size]必須大於0”。
“默認情況下，文本字段禁用Fielddata。” 這意味着默認情況下您無法搜索full_name字段。 但是，未分析的keyword字段可用於聚合。

解決方案1 ：使用Scroll API 。 它的工作原理是保留搜索上下文並發出多個請求，每次返回后續批次的結果。 如果您使用的是Python，則elasticsearch模塊具有scan()幫助函數來處理滾動並返回所有結果。

解決方案2 ：使用Search After API 。 它與Scroll類似，但提供實時光標而不是保留搜索上下文。 因此，它對於實時請求更有效。

Answer 4

為Elasticsearch 5.2.2工作

curl -XGET  http://localhost:9200/articles/_search?pretty -d '
{
    "aggs" : {
        "whatever" : {
            "terms" : { "field" : "yourfield", "size":10000 }
        }
    },
    "size" : 0
}'

"size":10000表示獲取（最多）10000個唯一值。 如果沒有這個，如果您有超過10個唯一值，則只返回10個值。

"size":0表示結果中"hits"不包含任何文檔。 默認情況下，返回10個文檔，這是我們不需要的。

參考：桶術語聚合

另請注意，根據此頁面，構面已被Elasticsearch 1.0中的聚合所取代，這些聚合是構面的超集。

Answer 5

直覺：用SQL術語：

Select distinct full_name from authors;

相當於

Select full_name from authors group by full_name;

因此，我們可以使用ElasticSearch中的分組/聚合語法來查找不同的條目。

假設以下是彈性搜索中存儲的結構：

[{
    "author": "Brian Kernighan"
  },
  {
    "author": "Charles Dickens"
  }]

什么不起作用：簡單聚合

{
  "aggs": {
    "full_name": {
      "terms": {
        "field": "author"
      }
    }
  }
}

我收到以下錯誤：

{
  "error": {
    "root_cause": [
      {
        "reason": "Fielddata is disabled on text fields by default...",
        "type": "illegal_argument_exception"
      }
    ]
  }
}

什么像魅力一樣：在字段中追加.keyword

{
  "aggs": {
    "full_name": {
      "terms": {
        "field": "author.keyword"
      }
    }
  }
}

樣本輸出可以是：

{
  "aggregations": {
    "full_name": {
      "buckets": [
        {
          "doc_count": 372,
          "key": "Charles Dickens"
        },
        {
          "doc_count": 283,
          "key": "Brian Kernighan"
        }
      ],
      "doc_count": 1000
    }
  }
}

獎金提示：

讓我們假設有問題的字段嵌套如下：

[{
    "authors": [{
        "details": [{
            "name": "Brian Kernighan"
          }]
      }]
  },
  {
    "authors": [{
        "details": [{
            "name": "Charles Dickens"
          }]
      }]
  }
]

現在正確的查詢變為：

{
  "aggregations": {
    "full_name": {
      "aggregations": {
        "author_details": {
          "terms": {
            "field": "authors.details.name"
          }
        }
      },
      "nested": {
        "path": "authors.details"
      }
    }
  },
  "size": 0
}

使用Elasticsearch查詢字段的所有唯一值

問題描述

5 個解決方案

解決方案1
18 2013-01-23 12:04:26

解決方案2
12 2014-10-30 07:28:21

解決方案3
4 2017-02-20 19:11:07

解決方案4
2 2017-12-01 22:31:04

解決方案5
2 2018-02-28 13:12:05

使用Elasticsearch查詢字段的所有唯一值

問題描述

5 個解決方案

解決方案1 18 2013-01-23 12:04:26

解決方案2 12 2014-10-30 07:28:21

解決方案3 4 2017-02-20 19:11:07

解決方案4 2 2017-12-01 22:31:04

解決方案5 2 2018-02-28 13:12:05

解決方案1
18 2013-01-23 12:04:26

解決方案2
12 2014-10-30 07:28:21

解決方案3
4 2017-02-20 19:11:07

解決方案4
2 2017-12-01 22:31:04

解決方案5
2 2018-02-28 13:12:05