简体   繁体   中英

Extract keywords from fields

I want to write a query to analyze one or more fields ?

ie current analyzers require text to function, instead of passing text I want to pass a field value.

If I have a document like this

{
    "desc": "A document description",
    "name": "This name is not original",
    "amount": 3000
}

I would like to return something like the below

{
    "desc": ["document", "description"],
    "name": ["name", "original"],
    "amount": 3000
}

You can use Term Vectors or Multi Term Vectors to achieve what you're looking for:

https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-multi-termvectors.html

You'd have to specify the Ids of the fields you want as well as the fields and it will return an array of analyzed tokens for each document you have as well as certain other info which you can easily disable.

GET /exampleindex/_doc/_mtermvectors
{
  "ids": [
    "1","2"
  ],
  "parameters": {
    "fields": [
      "*"
    ]
  }
}

Will return something along the lines of:

"docs": [
    {
      "_index": "exampleindex",
      "_type": "_doc",
      "_id": "1",
      "_version": 2,
      "found": true,
      "took": 0,
      "term_vectors": {
        "desc": {
          "field_statistics": {
            "sum_doc_freq": 5,
            "doc_count": 2,
            "sum_ttf": 5
          },
          "terms": {
            "amazing": {
              "term_freq": 1,
              "tokens": [
                {
                  "position": 1,
                  "start_offset": 3,
                  "end_offset": 10
                }
              ]
            },
            "an": {
              "term_freq": 1,
              "tokens": [
                {
                  "position": 0,
                  "start_offset": 0,
                  "end_offset": 2
                }
              ]
            }

ah ok, this is a different scenario. To use an analyzer on a field, you have to declare it in the mapping, as you have seen in the doc. But if you associate an analyzer to a field in the mapping, then all the field values will be analyzed. Analyzer changes the way how your text is indexed in the inverted index of lucene - so only how to retrieve it - but not the content of the value. So you can Analyze the field and call the _analyze api, only when you need. If you want that your text will be retrieved in a different way in certain conditions, the scenario changes again. According me the faster and simpler solution for the last one scenario is to duplicate your field, one time with the analyzer, and the second one without

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM