简体   繁体   English

让elasticsearch只返回某些字段?

[英]Make elasticsearch only return certain fields?

I'm using elasticsearch to index my documents.我正在使用 elasticsearch 来索引我的文档。

Is it possible to instruct it to only return particular fields instead of the entire json document it has stored?是否可以指示它只返回特定字段而不是它存储的整个 json 文档?

Yep, Use a better option source filter .是的,使用更好的选项source filter If you're searching with JSON it'll look something like this:如果您使用 JSON 进行搜索,它将如下所示:

{
    "_source": ["user", "message", ...],
    "query": ...,
    "size": ...
}

In ES 2.4 and earlier, you could also use the fields option to the search API :在 ES 2.4 及更早版本中,您还可以使用搜索 APIfields 选项

{
    "fields": ["user", "message", ...],
    "query": ...,
    "size": ...
}

This is deprecated in ES 5+.这在 ES 5+ 中已被弃用。 And source filters are more powerful anyway!无论如何,源过滤器更强大!

I found the docs for the get api to be helpful - especially the two sections, Source filtering and Fields : https://www.elastic.co/guide/en/elasticsearch/reference/7.3/docs-get.html#get-source-filtering我发现get api的文档很有帮助 - 特别是两个部分,源过滤字段https : //www.elastic.co/guide/en/elasticsearch/reference/7.3/docs-get.html#get-源过滤

They state about source filtering:他们说明了源过滤:

If you only need one or two fields from the complete _source, you can use the _source_include & _source_exclude parameters to include or filter out that parts you need.如果您只需要完整 _source 中的一两个字段,您可以使用 _source_include 和 _source_exclude 参数来包含或过滤掉您需要的部分。 This can be especially helpful with large documents where partial retrieval can save on network overhead这对于部分检索可以节省网络开销的大型文档特别有用

Which fitted my use case perfectly.这非常适合我的用例。 I ended up simply filtering the source like so (using the shorthand):我最终只是像这样过滤了来源(使用速记):

{
    "_source": ["field_x", ..., "field_y"],
    "query": {      
        ...
    }
}

FYI, they state in the docs about the fields parameter:仅供参考,他们在关于fields参数的文档中说明:

The get operation allows specifying a set of stored fields that will be returned by passing the fields parameter. get 操作允许指定将通过传递 fields 参数返回的一组存储字段。

It seems to cater for fields that have been specifically stored, where it places each field in an array.它似乎迎合了专门存储的字段,它将每个字段放在一个数组中。 If the specified fields haven't been stored it will fetch each one from the _source, which could result in 'slower' retrievals.如果未存储指定的字段,它将从 _source 中获取每个字段,这可能会导致“更慢”的检索。 I also had trouble trying to get it to return fields of type object.我也很难让它返回对象类型的字段。

So in summary, you have two options, either though source filtering or [stored] fields.因此,总而言之,您有两种选择,通过源过滤或 [存储] 字段。

For the ES versions 5.X and above you can a ES query something like this:对于 ES 版本 5.X 及更高版本,您可以像这样进行 ES 查询:

    GET /.../...
    {
      "_source": {
        "includes": [ "FIELD1", "FIELD2", "FIELD3" ... " ]
      },
      .
      .
      .
      .
    }

In Elasticsearch 5.x the above mentioned approach is deprecated.在 Elasticsearch 5.x 中,不推荐使用上述方法。 You can use the _source approach, but but in certain situations it can make sense to store a field.您可以使用 _source 方法,但在某些情况下,存储字段是有意义的。 For instance, if you have a document with a title, a date, and a very large content field, you may want to retrieve just the title and the date without having to extract those fields from a large _source field:例如,如果您有一个包含标题、日期和非常大的内容字段的文档,您可能只想检索标题和日期,而不必从大的 _source 字段中提取这些字段:

In this case, you'd use:在这种情况下,您将使用:

{  
   "size": $INT_NUM_OF_DOCS_TO_RETURN,
   "stored_fields":[  
      "doc.headline",
      "doc.text",
      "doc.timestamp_utc"
   ],
   "query":{  
      "bool":{  
         "must":{  
            "term":{  
               "doc.topic":"news_on_things"
            }
         },
         "filter":{  
            "range":{  
               "doc.timestamp_utc":{  
                  "gte":1451606400000,
                  "lt":1483228800000,
                  "format":"epoch_millis"
               }
            }
         }
      }
   },
   "aggs":{  

   }
}

See the documentation on how to index stored fields.请参阅有关如何索引存储字段的文档。 Always happy for an Upvote!总是很高兴为一个Upvote!

here you can specify whichever field you want in your output and also which you don't.
  
  POST index_name/_search
    {
        "_source": {
            "includes": [ "field_name", "field_name" ],
            "excludes": [ "field_name" ]
        },
        "query" : {
            "match" : { "field_name" : "value" }
        }
    }

response_filtering response_filtering

All REST APIs accept a filter_path parameter that can be used to reduce the response returned by elasticsearch.所有 REST API 都接受filter_path参数,该参数可用于减少由elasticsearch返回的响应。 This parameter takes a comma separated list of filters expressed with the dot notation.此参数采用逗号分隔的过滤器列表,用点表示法表示。

https://stackoverflow.com/a/35647027/844700 https://stackoverflow.com/a/35647027/844700

Here is another solution, now using a match expression这是另一个解决方案,现在使用匹配表达式

Source filtering allows to control how the _source field is returned with every hit. 源过滤允许控制每次点击时返回 _source 字段的方式。

Tested with Elastiscsearch version 5.5使用 Elasticsearch 5.5 版测试

The keyword includes defines the specifics fields.关键字includes定义特定字段。

GET /my_indice/my_indice_type/_search
{
  "_source": {
    "includes": [
      "my_especific_field"
    ]
  },
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "_id": "%my_id_here_without_percent%"
          }
        }
      ]
    }
  }
}

A REST API GET request could be made with '_source' parameter.可以使用“_source”参数发出 REST API GET 请求。

Example Request示例请求

http://localhost:9200/opt_pr/_search?q=SYMBOL:ITC AND OPTION_TYPE=CE AND TRADE_DATE=2017-02-10 AND EXPIRY_DATE=2017-02-23&_source=STRIKE_PRICE

Response回复

{
"took": 59,
"timed_out": false,
"_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
},
"hits": {
    "total": 104,
    "max_score": 7.3908954,
    "hits": [
        {
            "_index": "opt_pr",
            "_type": "opt_pr_r",
            "_id": "AV3K4QTgNHl15Mv30uLc",
            "_score": 7.3908954,
            "_source": {
                "STRIKE_PRICE": 160
            }
        },
        {
            "_index": "opt_pr",
            "_type": "opt_pr_r",
            "_id": "AV3K4QTgNHl15Mv30uLh",
            "_score": 7.3908954,
            "_source": {
                "STRIKE_PRICE": 185
            }
        },
        {
            "_index": "opt_pr",
            "_type": "opt_pr_r",
            "_id": "AV3K4QTgNHl15Mv30uLi",
            "_score": 7.3908954,
            "_source": {
                "STRIKE_PRICE": 190
            }
        },
        {
            "_index": "opt_pr",
            "_type": "opt_pr_r",
            "_id": "AV3K4QTgNHl15Mv30uLm",
            "_score": 7.3908954,
            "_source": {
                "STRIKE_PRICE": 210
            }
        },
        {
            "_index": "opt_pr",
            "_type": "opt_pr_r",
            "_id": "AV3K4QTgNHl15Mv30uLp",
            "_score": 7.3908954,
            "_source": {
                "STRIKE_PRICE": 225
            }
        },
        {
            "_index": "opt_pr",
            "_type": "opt_pr_r",
            "_id": "AV3K4QTgNHl15Mv30uLr",
            "_score": 7.3908954,
            "_source": {
                "STRIKE_PRICE": 235
            }
        },
        {
            "_index": "opt_pr",
            "_type": "opt_pr_r",
            "_id": "AV3K4QTgNHl15Mv30uLw",
            "_score": 7.3908954,
            "_source": {
                "STRIKE_PRICE": 260
            }
        },
        {
            "_index": "opt_pr",
            "_type": "opt_pr_r",
            "_id": "AV3K4QTgNHl15Mv30uL5",
            "_score": 7.3908954,
            "_source": {
                "STRIKE_PRICE": 305
            }
        },
        {
            "_index": "opt_pr",
            "_type": "opt_pr_r",
            "_id": "AV3K4QTgNHl15Mv30uLd",
            "_score": 7.381078,
            "_source": {
                "STRIKE_PRICE": 165
            }
        },
        {
            "_index": "opt_pr",
            "_type": "opt_pr_r",
            "_id": "AV3K4QTgNHl15Mv30uLy",
            "_score": 7.381078,
            "_source": {
                "STRIKE_PRICE": 270
            }
        }
    ]
}

} }

Yes by using source filter you can accomplish this, here is the doc source-filtering是的,通过使用源过滤器,您可以完成此操作,这是文档源过滤器

Example Request示例请求

POST index_name/_search
 {
   "_source":["field1","filed2".....] 
 }

Output will be输出将是

{
  "took": 57,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 1,
    "hits": [
      {
        "_index": "index_name",
        "_type": "index1",
        "_id": "1",
        "_score": 1,
        "_source": {
          "field1": "a",
          "field2": "b"
        },
        {
          "field1": "c",
          "field2": "d"
        },....
      }
    ]
  }
}

In java you can use setFetchSource like this :在 Java 中,您可以像这样使用 setFetchSource:

client.prepareSearch(index).setTypes(type)
            .setFetchSource(new String[] { "field1", "field2" }, null)

For example, you have a doc with three fields:例如,您有一个包含三个字段的文档:

PUT movie/_doc/1
{
  "name":"The Lion King",
  "language":"English",
  "score":"9.3"
}

If you want to return name and score you can use the following command:如果要返回namescore ,可以使用以下命令:

GET movie/_doc/1?_source_includes=name,score

If you want to get some fields which match a pattern:如果你想获得一些匹配模式的字段:

GET movie/_doc/1?_source_includes=*re

Maybe exclude some fields:也许排除一些字段:

GET movie/_doc/1?_source_excludes=score

There are several methods that can be useful to achieve field-specific results.有几种方法可用于实现特定领域的结果。 One can be through the source method.一种可以是通过source方法。 And another method that can also be useful to receive cleaner and more summarized answers according to our interests is filter_path :另一种根据我们的兴趣接收更清晰、更概括的答案也很有用的方法是filter_path

Document Json:文档 Json:

"hits" : [
  {
    "_index" : "xxxxxx",
    "_type" : "_doc",
    "_id" : "xxxxxx",
    "_score" : xxxxxx,
    "_source" : {
      "year" : 2020,
      "created_at" : "2020-01-29",
      "url" : "www.github.com/mbarr0987",
      "name":"github"
    }
  }

Query:询问:

GET bot1/_search?filter_path=hits.hits._source.url
{
  "query": {
    "bool": {
      "must": [
        {"term": {"name.keyword":"github" }}
       ]
    }
  }
}

Output:输出:

{
  "hits" : {
    "hits" : [
      {
        "_source" : {
          "url" : "www.github.com/mbarr0987"
            }
          }
      ]
   }
}

if you know sql, please write a query to get the code's value,for example sql query equivalent and elasticsearch query如果您了解 sql,请编写查询以获取代码的值,例如 sql 查询等效和 elasticsearch 查询

POST /_sql/translate
{
  
  "query": "select name,surname from users"
}

result is ,be carefull look at the includes key结果是,仔细查看包含键

{
  "size" : 1000,
  "_source" : {
    "includes" : [
      "name",
      "surname"
    ],
    "excludes" : [ ]
  },
  "sort" : [
    {
      "_doc" : {
        "order" : "asc"
      }
    }
  ]
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM