简体   繁体   中英

How to search in ElasticSearch the most common word of a single field in a single document?

How to search in ElasticSearch the most common word of a single field in a single document? Lets say I have a document that have a field "pdf_content" of type keyword containing:

"good polite nice good polite good"

I would like a return of

{
    word: good,
    occurences: 3
},
{
    word: polite,
    occurences: 2
},
{
    word: nice,
    occurences: 1
},

How is this possible using ElasticSearch 7.15?

I tried this in the Kibana console:

GET /pdf/_search
{
  "aggs": {
    "pdf_contents": {
      "terms": { "field": "pdf_content" }
    }
  }
}

But it only returns me the list of PDFs i have indexed.

Have you ever tried term_vector ?:

Basically, you can do:

Mappings:

{
    "mappings": {
        "properties": {
            "pdf_content": {
                "type": "text",
                "term_vector": "with_positions_offsets_payloads"
            }
        }
    }
}

with your sample document:

POST /pdf/_doc/1

{
    "pdf_content": "good polite nice good polite good"
}

Then you can do:

GET /pdf/_termvectors/1

{
  "fields" : ["pdf_content"],
  "offsets" : false,
  "payloads" : false,
  "positions" : false,
  "term_statistics" : false,
  "field_statistics" : false
}

If you want to see other information, you can set them to true . Set all to false give you what you want.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM