Elasticsearch - 以任何方式找出所有字段值为文本的文档

Question

In the elasticsearch cluster, I accidentally pushed some text in a field which should ideally be a Number.在 elasticsearch 集群中，我不小心将一些文本推送到理想情况下应该是数字的字段中。 Later, I fixed that and pushed the Number type value.后来，我修复了这个问题并推送了 Number 类型的值。 Now, I wanted to fix it such that all the old values can be replaced by some Number for which I need to find out all the documents which are having this field as text.现在，我想修复它，以便所有旧值都可以替换为某个数字，我需要找出所有将此字段作为文本的文档。

Is there any elasticsearch query that I can use to get this information?是否有任何 elasticsearch 查询可用于获取此信息？

Answer 1

I think that can be possible by using a nested aggregations .我认为这可以通过使用嵌套aggregations来实现。

At the top-level;在顶层； use terms aggregation to know text values, at the sub-level;在子级别使用术语聚合来了解文本值； use top_hits aggregation to get documents that includes these values.使用top_hits聚合来获取包含这些值的documents 。

for instance:例如：

GET example_index/_search
{
  "size": 0,
  "aggs": {
    "NAME": {
      "terms": {
        "field": "example_field.keyword",
        "size": 10
      },
      "aggs": {
        "documents": {
          "top_hits": {
            "size": 10
          }
        }
      }
    }
  }
}

This query;这个查询； will return distinct values of the field, and the related documents in the sub-level, something like:将返回字段的不同值以及子级别中的相关documents ，例如：

{
  "aggregations": {
    "NAME": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "mistake",
          "doc_count": 2,
          "documents": {
            "hits": {
              "total": 2,
              "max_score": 1,
              "hits": [
                {
                  "_index": "example_index",
                  "_type": "example_index",
                  "_id": "2QoDoXEBOCkJkkpwq5P0",
                  "_score": 1,
                  "_source": {
                    "example_field": "mistake"
                  }
                },
                {
                  "_index": "example_index",
                  "_type": "example_index",
                  "_id": "qAoDoXEBOCkJkkpwq5T0",
                  "_score": 1,
                  "_source": {
                    "example_field": "mistake"
                  }
                }
              ]
            }
          }
        },
        {
          "key": "520",
          "doc_count": 2,
          "documents": {
            "hits": {
              "total": 1,
              "max_score": 1,
              "hits": [
                {
                  "_index": "example_index",
                  "_type": "example_index",
                  "_id": "5goDoXEBOCkJkkpwq5P0",
                  "_score": 1,
                  "_source": {
                    "example_field": "1"
                  }
                }
              ]
            }
          }
        }
      ]
    }
  }
}

I the example above;我上面的例子； we need to delete the documents with mistake value, you can simply delete them by id.我们需要删除mistake值的documents ，您可以简单地通过 id 删除它们。

NOTE: if you have a big index, it's rather to write a function inside your code that builds aggregations, gets the response, filters values if it can be parsed to a number, then removes documents by id.注意：如果您有一个大索引，最好在您的代码中编写一个 function 来构建聚合，获取响应，过滤值（如果可以解析为数字），然后按 id 删除文档。

Elasticsearch - 以任何方式找出所有字段值为文本的文档

问题描述

1 个解决方案

解决方案1
0 2020-06-02 06:15:26

Elasticsearch - 以任何方式找出所有字段值为文本的文档

问题描述

1 个解决方案

解决方案1 0 2020-06-02 06:15:26

解决方案1
0 2020-06-02 06:15:26