简体   繁体   English

突出显示ElasticSearch自动完成功能

[英]Highlight on ElasticSearch autocomplete

I have the following data to be indexed on ElasticSearch. 我有以下数据要在ElasticSearch上编制索引。

在此输入图像描述

I want to implement an autocomplete feature, and highlight why a specific document matched a query. 我想实现自动完成功能,并突出显示特定文档与查询匹配的原因。

This are the settings of my index: 这是我的索引的设置:

{
    "settings": {
        "number_of_shards": 1, 
        "analysis": {
            "filter": {
                "autocomplete_filter": { 
                    "type":     "edge_ngram",
                    "min_gram": 1,
                    "max_gram": 15
                }
            },
            "analyzer": {
                "autocomplete": {
                    "type":      "custom",
                    "tokenizer": "standard",
                    "filter": [
                        "autocomplete_filter" 
                    ]
                }
            }
        }
    }
}

Index Analyzing 指数分析

  • Splits text on word boundaries. 在单词边界上拆分文本。
  • Removes pontuation. 删除pontuation。
  • Lowercases 小写字母
  • Edge NGrams each token Edge NGrams每个令牌

So the Inverted Index looks like: 因此倒置指数看起来像:

在此输入图像描述

This is how i defined the mappings for a name field: 这就是我为名称字段定义映射的方式:

{
    "index_type": {
        "properties": {
            "name": {
                "type":     "string",
                "index_analyzer":  "autocomplete", 
                "search_analyzer": "standard" 
            }
        }
    }
}

When I query: 当我查询时:

GET http://localhost:9200/index/type/_search

{
    "query": {
        "match": {
            "name": "soft"
        }
    },
    "highlight": {
        "fields" : {
            "name" : {}
        }
    }
}

Search for: soft 搜索:

Applying the Standard Tokenizer, the "soft" is the term, to find on the inverted index. 应用标准标记符,“软”是用于在倒排索引上查找的术语。 This search matches the Documents: 1, 3, 4, 5, 6, 7 which is correct, but the highlighted part I would expect to be "soft" and not the whole word: 此搜索匹配文档:1,3,4,5,6,7这是正确的,但突出显示的部分我希望是“软”而不是整个单词:

{
  "hits": [
    {
      "_source": {
        "name": "SoftwareRocks everytime"
      },
      "highlight": {
        "name": [
          "<em>SoftwareRocks</em> everytime"
        ]
      }
    },
    {
      "_source": {
        "name": "Software AG"
      },
      "highlight": {
        "name": [
          "<em>Software</em> AG"
        ]
      }
    },
    {
      "_source": {
        "name": "Software AG2"
      },
      "highlight": {
        "name": [
          "<em>Software</em> AG2"
        ]
      }
    },
    {
      "_source": {
        "name": "Op Software AG good software better"
      },
      "highlight": {
        "name": [
          "Op <em>Software</em> AG good <em>software</em> better"
        ]
      }
    },
    {
      "_source": {
        "name": "Op Software AG"
      },
      "highlight": {
        "name": [
          "Op <em>Software</em> AG"
        ]
      }
    },
    {
      "_source": {
        "name": "is soft ware ok"
      },
      "highlight": {
        "name": [
          "is <em>soft</em> ware ok"
        ]
      }
    }
  ]
}

Search for: software ag 搜索: 软件ag

Applying the Standard Tokenizer, the "software ag" is transformed into "software" and "ag", to find on the inverted index. 应用标准标记符,将“软件ag”转换为“软件”和“ag”,以找到倒排索引。 This search matches the Documents: 1, 3, 4, 5, 6, which is correct, but the highlighted part I would expect to be "software" and "ag" and not the whole word around "software" and "ag": 这个搜索匹配文档:1,3,4,5,6,这是正确的,但突出显示的部分我希望是“软件”和“ag”,而不是围绕“软件”和“ag”的整个词:

{
  "hits": [
    {
      "_source": {
        "name": "Software AG"
      },
      "highlight": {
        "name": [
          "<em>Software</em> <em>AG</em>"
        ]
      }
    },
    {
      "_source": {
        "name": "Software AG2"
      },
      "highlight": {
        "name": [
          "<em>Software</em> <em>AG2</em>"
        ]
      }
    },
    {
      "_source": {
        "name": "Op Software AG"
      },
      "highlight": {
        "name": [
          "Op <em>Software</em> <em>AG</em>"
        ]
      }
    },
    {
      "_source": {
        "name": "Op Software AG good software better"
      },
      "highlight": {
        "name": [
          "Op <em>Software</em> <em>AG</em> good <em>software</em> better"
        ]
      }
    },
    {
      "_source": {
        "name": "SoftwareRocks everytime"
      },
      "highlight": {
        "name": [
          "<em>SoftwareRocks</em> everytime"
        ]
      }
    }
  ]
}

I read the highlight documentation on elasticsearch, but I cannot understand how the highlighting is performed. 我阅读了有关elasticsearch的高亮文档,但我无法理解突出显示是如何执行的。 For the two examples above I expect only the matched token on the inverted index to be highlighted and not the whole word. 对于上面的两个例子,我希望只有突出显示倒排索引上的匹配标记,而不是整个单词。 Can anyone help how to highlight only the passed value? 任何人都可以帮助如何突出显示传递的值?

Update 更新

So, in seems that on ElasticSearch website, the autocomplete on the server side is similar to my implementation. 因此,似乎在ElasticSearch网站上,服务器端的自动完成与我的实现类似。 However it seems that they highlight the matched query on the client. 但是,它们似乎突出显示了客户端上匹配的查询。 If they do like this, I started to think that there is not a proper solution to do it on ElasticSearch side, so I implemented the highlight feature on server side instead of on client side(as they seem to do). 如果他们这样做,我开始认为在ElasticSearch方面没有合适的解决方案,所以我在服务器端实现了突出显示功能,而不是在客户端(就像他们似乎那样)。

My implementation on server side(using PHP) is: 我在服务器端的实现(使用PHP)是:

public function search($term)
{
    $params = [
        'index' => $this->getIndexName(),
        'type' => $this->getIndexType(),
        'body' => [
            'query' => [
                'match' => [
                    'name' => $term
                ]
            ]
        ]
    ];

    $results = $this->client->search($params);

    $hits = $results['hits']['hits'];

    $data = [];

    $wrapBefore = '<strong>';
    $wrapAfter = '</strong>';

    foreach ($hits as $hit) {
        $data[] = [
            $hit['_source']['id'],
            $hit['_source']['name'],
            preg_replace("/($term)/i", "$wrapBefore$1$wrapAfter", strip_tags($hit['_source']['name']))
        ];
    }

    return $data;
}

Outputs what I aimed with this question: 输出我对此问题的目标:

在此输入图像描述

I added a bounty to see if there is a solution at ElasticSearch level to achive what I described above. 我添加了一笔赏金,看看ElasticSearch级别是否有解决方案来实现我上面描述的内容。

As of now with latest version of elastic this is not possible as highligh documentation don't refer any settings or query for this. 截至目前使用最新版本的弹性这是不可能的,因为高亮度文档不会引用任何设置或查询。 I checked elastic autocomplete example in browser console under xhr requests tab and found the response for "att" autocomplete response for keyword as follows. 我在xhr请求选项卡下的浏览器控制台中检查了弹性自动完成示例,并找到关键字“att”自动完成响应的响应,如下所示。

url - https://search.elastic.co/suggest?q=att
    {
        "current_page": 1,
        "last_page": 4,
        "total_hits": 49,
        "hits": [
            {
                "tags": [],
                "url": "/elasticon/tour/2016/jp/not-attending",
                "section": "Elasticon",
                "title": "Not <em>Attending</em> - JP"
            },
            {
                "section": "Elasticon",
                "title": "<em>Attending</em> from Training - JP",
                "tags": [],
                "url": "/elasticon/tour/2016/jp/attending-training"
            },
            {
                "tags": [],
                "url": "/elasticon/tour/2016/jp/attending-keynote",
                "title": "<em>Attending</em> from Keynote - JP",
                "section": "Elasticon"
            },
            {
                "tags": [],
                "url": "/elasticon/tour/2016/not-attending",
                "section": "Elasticon",
                "title": "Thank You - Not <em>Attending</em>"
            },
            {
                "tags": [],
                "url": "/elasticon/tour/2016/attending",
                "section": "Elasticon",
                "title": "Thank You - <em>Attending</em>"
            },
            {
                "section": "Blog",
                "title": "What It's Like to <em>Attend</em> Elastic Training",
                "tags": [],
                "url": "/blog/what-its-like-to-attend-elastic-training"
            },
            {
                "tags": "Elasticsearch",
                "url": "/guide/en/elasticsearch/plugins/5.0/mapper-attachments-highlighting.html",
                "section": "Docs/",
                "title": "Highlighting <em>attachments</em>"
            },
            {
                "title": "<em>attachments</em> » email",
                "section": "Docs/",
                "tags": "Logstash",
                "url": "/guide/en/logstash/5.0/plugins-outputs-email.html#plugins-outputs-email-attachments"
            },
            {
                "section": "Docs/",
                "title": "Configuring Email <em>Attachments</em> » Actions",
                "tags": "Watcher",
                "url": "/guide/en/watcher/2.4/actions.html#configuring-email-attachments"
            },
            {
                "url": "/guide/en/watcher/2.4/actions.html#hipchat-action-attributes",
                "tags": "Watcher",
                "title": "HipChat Action <em>Attributes</em> » Actions",
                "section": "Docs/"
            },
            {
                "title": "Slack Action <em>Attributes</em> » Actions",
                "section": "Docs/",
                "tags": "Watcher",
                "url": "/guide/en/watcher/2.4/actions.html#slack-action-attributes"
            }
        ],
        "aggs": {
            "sections": [
                {
                    "Elasticon": 5
                },
                {
                    "Blog": 1
                },
                {
                    "Docs/": 43
                }
            ],
            "top_tags": [
                {
                    "XPack": 14
                },
                {
                    "Elasticsearch": 12
                },
                {
                    "Watcher": 9
                },
                {
                    "Logstash": 4
                },
                {
                    "Clients": 3
                },
                {
                    "Shield": 1
                }
            ]
        }
    }

But on frontend they are showing "att" only highlighted on in the autosuggest results. 但是在前端,他们只是在autosuggest结果中显示“att”。 Hence they are handling the highlight stuff on browser layer. 因此,他们正在处理浏览器层上的突出显示内容。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM