[英]Elasticsearch - Any way to find out all the documents with field value as text
In the elasticsearch cluster, I accidentally pushed some text in a field which should ideally be a Number.在 elasticsearch 集群中,我不小心将一些文本推送到理想情况下应该是数字的字段中。 Later, I fixed that and pushed the Number type value.后来,我修复了这个问题并推送了 Number 类型的值。 Now, I wanted to fix it such that all the old values can be replaced by some Number for which I need to find out all the documents which are having this field as text.现在,我想修复它,以便所有旧值都可以替换为某个数字,我需要找出所有将此字段作为文本的文档。
Is there any elasticsearch query that I can use to get this information?是否有任何 elasticsearch 查询可用于获取此信息?
I think that can be possible by using a nested aggregations
.我认为这可以通过使用嵌套aggregations
来实现。
At the top-level;在顶层; use terms aggregation to know text values, at the sub-level;在子级别使用术语聚合来了解文本值; use top_hits aggregation to get documents
that includes these values.使用top_hits聚合来获取包含这些值的documents
。
for instance:例如:
GET example_index/_search
{
"size": 0,
"aggs": {
"NAME": {
"terms": {
"field": "example_field.keyword",
"size": 10
},
"aggs": {
"documents": {
"top_hits": {
"size": 10
}
}
}
}
}
}
This query;这个查询; will return distinct values of the field, and the related documents
in the sub-level, something like:将返回字段的不同值以及子级别中的相关documents
,例如:
{
"aggregations": {
"NAME": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "mistake",
"doc_count": 2,
"documents": {
"hits": {
"total": 2,
"max_score": 1,
"hits": [
{
"_index": "example_index",
"_type": "example_index",
"_id": "2QoDoXEBOCkJkkpwq5P0",
"_score": 1,
"_source": {
"example_field": "mistake"
}
},
{
"_index": "example_index",
"_type": "example_index",
"_id": "qAoDoXEBOCkJkkpwq5T0",
"_score": 1,
"_source": {
"example_field": "mistake"
}
}
]
}
}
},
{
"key": "520",
"doc_count": 2,
"documents": {
"hits": {
"total": 1,
"max_score": 1,
"hits": [
{
"_index": "example_index",
"_type": "example_index",
"_id": "5goDoXEBOCkJkkpwq5P0",
"_score": 1,
"_source": {
"example_field": "1"
}
}
]
}
}
}
]
}
}
}
I the example above;我上面的例子; we need to delete the documents
with mistake
value, you can simply delete them by id.我们需要删除mistake
值的documents
,您可以简单地通过 id 删除它们。
NOTE: if you have a big index, it's rather to write a function inside your code that builds aggregations, gets the response, filters values if it can be parsed to a number, then removes documents by id.注意:如果您有一个大索引,最好在您的代码中编写一个 function 来构建聚合,获取响应,过滤值(如果可以解析为数字),然后按 id 删除文档。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.