简体   繁体   中英

ElasticSearch Keyword usage with a prefix search

I have a requirement to be able to search a sentence as complete or with prefix. The UI library (reactive search) I am using is generating the query in this way:

"simple_query_string": {
  "query": "\"Louis George Maurice Adolphe\"",
  "fields": [
    "field1",
    "field2",    
    "field3"
  ],
  "default_operator": "or"
}

I am expecting it to returns results for eg. Louis George Maurice Adolphe (Roche) but NOT just records containing partial terms like Louis or George

Currently, I have code like this but it only brings the record if I search with complete word Louis George Maurice Adolphe (Roche) but not a prefix Louis George Maurice Adolphe .

{
  "settings": {
    "analysis": {
      "char_filter": {
        "space_remover": {
          "type": "mapping",
          "mappings": [
            "\\u0020=>"
          ]
        }
      },
      "normalizer": {
        "lower_case_normalizer": {
          "type": "custom",
          "char_filter": [
            "space_remover"
          ],
          "filter": [
            "lowercase"
          ]
        }
      }
    }
  },
  "mappings": {
    "_doc": {
      "properties": {
        "field3": {
          "type": "keyword",
          "normalizer": "lower_case_normalizer"
        }
      }
    }
  }
}

Any guidance on the above is appreciated. Thanks.

You are not using the prefix query hence not getting result for prefix search terms, I used same mapping and sample doc, but changed the search query which gives the expected results

Index mapping

{
    "settings": {
        "analysis": {
            "char_filter": {
                "space_remover": {
                    "type": "mapping",
                    "mappings": [
                        "\\u0020=>"
                    ]
                }
            },
            "normalizer": {
                "lower_case_normalizer": {
                    "type": "custom",
                    "char_filter": [
                        "space_remover"
                    ],
                    "filter": [
                        "lowercase"
                    ]
                }
            }
        }
    },
    "mappings": {
        "properties": {
            "field3": {
                "type": "keyword",
                "normalizer": "lower_case_normalizer"
            }
        }
    }
}

Indexed sample doc

{
   "field3" : "Louis George Maurice Adolphe (Roche)"
}

Search query

{
  "query": {
    "prefix": {
     "field3": {
        "value": "Louis George Maurice Adolphe"
      }
    }
  }
}

Search result

"hits": [
            {
                "_index": "normal",
                "_type": "_doc",
                "_id": "1",
                "_score": 1.0,
                "_source": {
                    "field3": "Louis George Maurice Adolphe (Roche)"
                }
            }
        ]

The underlying issue stems from the fact that you're applying a whitespace remover. What this practically means is that when you ingest your docs:

GET your_index_name/_analyze
{
  "text": "Louis George Maurice Adolphe (Roche)",
  "field": "field3"
}

they're indexed as

{
  "tokens" : [
    {
      "token" : "louisgeorgemauriceadolphe(roche)",
      "start_offset" : 0,
      "end_offset" : 36,
      "type" : "word",
      "position" : 0
    }
  ]
}

So if you indend to use simple_string , you may want to rethink your normalizers.

@Ninja's answer fails when you search for George Maurice Adolphe , ie no prefix intersection.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM