简体   繁体   English

Elasticsearch中的单向同义词搜索方式

[英]One way synonym search in Elasticsearch

I want to implement synonym one way search in Elasticsearch. 我想在Elasticsearch中实现同义词单向搜索。 One way search meaning if I define a => x,y,z and search for 'a', search result should include all the documents containing words x,y,z,a which is working now. 单向搜索意味着如果我定义了一个=> x,y,z并搜索“a”,搜索结果应该包含所有包含单词x,y,z,a的文档,这些文档现在正在使用。 But if I search for 'x' then search result should contain document which contains only 'x' and not 'a'. 但是如果我搜索'x',那么搜索结果应该包含仅包含'x'而不包含'a'的文档。

Is this possible in Elasticsearch ? 这在Elasticsearch中是否可行?

You can not do this in a synonym relation as the behaviour you are explaining is a hyperonym / hyponym relation. 您不能在同义词关系中执行此操作,因为您解释的行为是hyperonym /下位hyponym关系。

You can achieve such a behaviour on index-time though. 您可以在index-time实现这种行为。

So for each occurrence of a you also index x,y,z . 因此,对于每次出现的a你也索引x,y,z Using an additional field for this would be a good idea to not mess up the scores. 使用额外的字段是一个好主意,不要搞砸分数。

This behaviour is sadly not part of elasticsearch and has to be implemented by hand while feeding the data. 遗憾的是,这种行为不是弹性搜索的一部分,必须在提供数据时手动实现。

I've implemented one-way synonyms by inverting the synonym expression: 我通过反转同义词表达式实现了单向同义词:

eg: 例如:

Robert => Bob, Rob
Bob => Robert

but I had to use this analyzer with synonyms is different way. 但我不得不使用这种分析器与同义词是不同的方式。 In mapping, synonyms are hooked to a new field: 在映射中,同义词被挂钩到一个新字段:

"FirstName": {
  "type": "string",
  "analyzer": "standard",
  "search_analyzer": "standard",
  "fields": {
    "raw": {
      "type": "string",
      "analyzer": "standard"
    },
    "synonym": {
      "type": "string",
      "analyzer": "firstname_synonym_analyzer"
    }
  }
},

And search looks like this: 搜索看起来像这样:

 "bool": {
    "should": [
       {
          "match": {
             "FirstName": {
                "query": "Jo"
             }
          }
       },
       {
          "match": {
             "FirstName.synonym": {
                "query": "Jo"
             }
          }
       }
    ],
    "minimum_should_match": 1
 }

This way first field contains normal value, second just possible synonyms. 这样第一个字段包含正常值,第二个字段包含可能的同义词 So looking for Bob finds Robert , but not Rob . 所以寻找Bob找到Robert ,但不是Rob

I would implement it with synonyms using generic expansion aka genre expansion and different analyzers for index-time and query-time 我会用同义词使用generic expansiongenre expansion和不同的分析器来实现索引时间和查询时间

Synonyms at index time : 索引时间的同义词:

Bob => Bob, Robert
Rob => Rob, Robert

The format is like 格式就像

word => the same word, more generic word, even more generic, etc

Query time : no synonyms applied 查询时间 :未应用同义词

Query for "Bob" will return only documents where "Bob" was. 查询“Bob”将仅返回“Bob”所在的文档。

Query for "Rob" will return only documents where "Rob" was. 查询“Rob”将仅返回“Rob”所在的文档。

Query for "Robert" will return documents where "Bob", "Rob" and "Robert" was. 查询“Robert”将返回“Bob”,“Rob”和“Robert”所在的文档。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM