简体   繁体   English

如何将 Lucene 查询字符串转换为 Elasticsearch Match/Match_Prefix 等等效项

[英]How to convert Lucene query string to Elasticsearch Match/Match_Prefix etc equivalent

I am currently working on migrating from SOLR v3 to Elasticsearch v5.11.我目前正在努力从 SOLR v3 迁移到 Elasticsearch v5.11。 My question is, how would I convert the below query string to an Elasticsearch Match/Match Phrase etc equivalent.我的问题是,如何将以下查询字符串转换为 Elasticsearch Match/Match Phrase 等等价物。 Is this even possible?这甚至可能吗?

(entityName:(john AND lewis OR "john lewis") 
OR entityNameText:(john AND lewis OR "john lewis")) 
AND (status( "A" OR "I" status))

I tried to do so, so far only with the first set of brackets but it doesn't seem correct:我尝试这样做,到目前为止只使用第一组括号,但似乎不正确:

{
"bool": {
    "should": [
        [{
            "bool": {
                "should": [
                    [{
                        "match_phrase": {
                            "entityName": "john lewis"
                        }
                    }]
                ],
                "must": [
                    [{
                        "match": {
                            "entityName": {
                                "query": "john lewis",
                                "operator": "and"
                            }
                        }
                    }]
                ]
            }
        }, {
            "bool": {
                "should": [
                    [{
                        "match_phrase": {
                            "entityNameText": "john lewis"
                        }
                    }]
                ],
                "must": [
                    [{
                        "match": {
                            "entityNameText": {
                                "query": "john lewis",
                                "operator": "and"
                            }
                        }
                    }]
                ]
            }
        }]
    ]
}

} }

Thanks谢谢

Updated:更新:

entityName and entityNameText are both mapped as text types with custom analyzers for both search and query. entityName 和 entityNameText 都映射为文本类型,使用自定义分析器进行搜索和查询。 Status is mapped as a keyword type.状态被映射为关键字类型。

Posting the answer for anyone that is interesting in this in the future.为将来对此感兴趣的任何人发布答案。 Not entirely sure why but I wrote two alternative queries using ES Query DSL and found them to be equivalent to the original Lucene query, returning exactly the same results.不完全确定为什么,但我使用 ES Query DSL 编写了两个替代查询,发现它们与原始 Lucene 查询等效,返回完全相同的结果。 Not sure if that's a pro or con of the ES Query DSL.不确定这是 ES Query DSL 的优点还是缺点。

Original Lucene Query:原始 Lucene 查询:

{
"query": {
    "query_string" : {
        "query" : "entityName:(john AND Lewis OR \"john Lewis\") OR entityNameText:(john AND Lewis OR \"john Lewis\")"
    }
}

} }

Query alternative 1:查询备选方案 1:

{
"bool": {
    "should": [
        [{
            "bool": {
                "should": [
                    [{
                        "match": {
                            "entityName": {
                                "query": "john Lewis",
                                "operator": "and"
                            }
                        }
                    }, {
                        "match_phrase": {
                            "entityName": "john Lewis"
                        }
                    }]
                ]
            }
        }, {
            "bool": {
                "should": [
                    [{
                        "match": {
                            "entityNameText": {
                                "query": "john Lewis",
                                "operator": "and"
                            }
                        }
                    }, {
                        "match_phrase": {
                            "entityNameText": "john Lewis"
                        }
                    }]
                ]
            }
        }]
    ]
}
}

Query alternative 2查询备选方案 2

{
"bool": {
    "should": [
        [{
            "multi_match": {
                "query": "john Lewis",
                "type": "most_fields",
                "fields": ["entityName", "entityNameText"],
                "operator": "and"
            }
        }, {
            "multi_match": {
                "query": "john Lewis",
                "type": "phrase",
                "fields": ["entityName", "entityNameText"]
            }
        }]
    ]
}
}

With this mapping:使用此映射:

{
"entity": {
    "dynamic_templates": [{
        "catch_all": {
            "match_mapping_type": "*",
            "mapping": {
                "type": "text",
                "store": true,
                "analyzer": "phonetic_index",
                "search_analyzer": "phonetic_query"
            }
        }
    }],
    "_all": {
        "enabled": false
    },
    "properties": {
        "entityName": {
            "type": "text",
            "store": true,
            "analyzer": "indexed_index",
            "search_analyzer": "indexed_query",
            "fields": {
                "entityNameLower": {
                    "type": "text",
                    "analyzer": "lowercase"
                },
                "entityNameText": {
                    "type": "text",
                    "store": true,
                    "analyzer": "text_index",
                    "search_analyzer": "text_query"
                },
                "entityNameNgram": {
                    "type": "text",
                    "analyzer": "ngram_index",
                    "search_analyzer": "ngram_query"
                },
                "entityNamePhonetic": {
                    "type": "text",
                    "analyzer": "ngram_index",
                    "search_analyzer": "ngram_query"
                }
            }
        },
        "status": {
            "type": "keyword",
            "norms": false,
            "store": true
        }
    }
}
}

The answer will depend on how you've specified your mapping, but I'll assume that you did zero customer mapping.答案取决于您如何指定映射,但我假设您进行了客户映射。

Let's break down the different parts first, then we'll put them all back together.让我们先分解不同的部分,然后我们将它们重新组合在一起。

status( "A" OR "I" status)状态(“A”或“I”状态)

This is a "terms" query, think of it as a SQL "IN" clause.这是一个“terms”查询,可以将其视为 SQL“IN”子句。

  "terms": {
    "status": [
      "a",
      "i"
    ]
  }

entityName:(john AND lewis OR "john lewis")实体名称:(约翰和刘易斯或“约翰刘易斯”)

ElasticSearch breaks down string fields into distinct parts. ElasticSearch 将字符串字段分解为不同的部分。 We can use this to our advantage here by using another "terms" query.我们可以通过使用另一个“术语”查询来利用这一点。 we don't need to specify it as 3 different parts, ES will handle that under the hood.我们不需要将它指定为 3 个不同的部分,ES 会在幕后处理。

"terms": {
              "entityName": [
                "john",
                "lewis"
              ]
            }

entityNameText:(john AND lewis OR "john lewis")) entityNameText:(john AND lewis OR "john lewis"))

Exactly the same logic as above, just searching on a different field和上面的逻辑完全一样,只是在不同的字段上搜索

"terms": { "entityNameText": [ "john", "lewis" ] } “条款”:{“实体名称文本”:[“约翰”,“刘易斯”]}

AND vs OR与与或

In an ES query.在 ES 查询中。 And = "must" Or = "should".并且 =“必须”或 =“应该”。

Put it all together把它们放在一起

GET test1/type1/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "terms": {
            "status": [
              "a",
              "i"
            ]
          }
        },
        {
          "bool": {
            "should": [
              {
                "terms": {
                  "entityName": [
                    "john",
                    "lewis"
                  ]
                }
              },
              {
                "terms": {
                  "entityNameText": [
                    "john",
                    "lewis"
                  ]
                }
              }
            ]
          }
        }
      ]
    }
  }
}

Below is a link to the full setup I used to test the query.下面是我用来测试查询的完整设置的链接。

https://gist.github.com/jayhilden/cf251cd751ef8dce7a57df1d03396778 https://gist.github.com/jayhilden/cf251cd751ef8dce7a57df1d03396778

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM