简体   繁体   中英

Highlight whole content in Elasticsearch for multivalue fields

Using the highlight feature of Elasticsearch:

"highlight": {
  "fields": {
    "tags": { "number_of_fragments": 0 }
  }
}

With number_of_fragments: 0 , no fragments are produced, but the whole content of the field is returned. This is useful for short texts, because documents can be displayed as normal, and people can easily scan for highlighted parts.

How do you use this when a document contains an array with multiple values?

PUT /test/doc/1
{
  "tags": [
    "one hit tag",
    "two foo tag",
    "three hit tag",
    "four foo tag"
  ]
}

GET /test/doc/_search
{
  "query": { 
    "match": { "tags": "hit"} 
  }, 
  "highlight": {
    "fields": {
      "tags": { "number_of_fragments": 0 }
    }
  }
}

Now what I would like to show the user:

1 result:

Document 1, tagged with:

"one hit tag", "two foo tag", "three hit tag", "four foo tag"

Unfortunately, this is the result of the query:

{
     "took": 1,
     "timed_out": false,
     "_shards": {
        "total": 5,
        "successful": 5,
        "failed": 0
     },
     "hits": {
        "total": 1,
        "max_score": 0.10848885,
        "hits": [
           {
              "_index": "test",
              "_type": "doc",
              "_id": "1",
              "_score": 0.10848885,
              "_source": {
                 "tags": [
                    "one hit tag",
                    "two foo tag",
                    "three hit tag",
                    "four foo tag"
                 ]
              },
              "highlight": {
                 "tags": [
                    "one <em>hit</em> tag",
                    "three <em>hit</em> tag"
                 ]
              }
           }
        ]
     }
  }

How can I use this to get to:

   "tags": [
      "one <em>hit</em> tag",
      "two foo tag",
      "three <em>hit</em> tag",
      "four foo tag"
   ]

One possibility is to strip the <em> html-tags from the highlighted fields. Then look them up in the original field:

tags = [
   "one hit tag",
   "two foo tag",
   "three hit tag",
   "four foo tag"
]
highlighted = [
  "one <em>hit</em> tag",
  "three <em>hit</em> tag",
] 

highlighted.each do |highlighted_tag|
  if (index = tags.index(highlighted_tag.gsub(/<\/?em>/, '')))
    tags[index] = highlighted_tag
  end
end

puts tags #=> 
# one <em>hit</em> tag
# two foo tag
# three <em>hit</em> tag
# four foo tag

This does not receives a price for the most beautiful code, but I reckon it gets the job done.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM