简体   繁体   中英

Elasticsearch proper way to escape spaces, ? doesn't work in all scenarios

I'm trying to get searching with spaces to work properly in elasticsearch but having a ton of trouble getting it to behave the same way as it does on another field.

I have two fields, Name and Addresses.First().Line1 that I want to be able to search and preserve spaces in the search. For instance, searching for Bob Smi* would return Bob Smith but not just Bob .

This is working for my Name field by doing a query string search with the space replaced with ? . I'm also doing a wildcard so my final query is *bob?smi* .

However, when I try to also search by line1, I get no results. Eg *4800* returns a record with line1 like 4800 Street , but when I do the same transformation with 4800 street to get *4800?street* , I get no results.

Below is my query

{
  "from": 0,
  "size": 50,
  "query": {
    "bool": {
      "must": [
        {
          "query_string": {
            "query": "*4800?Street*",
            "fields": [
              "name",
              "addresses.line1"
            ]
          }
        }
      ]
    }
  }
}

returns no result.

Why would *bob?smi* return result with name Bob Smith but *4800?street* not return result with line item 4800 street ?

Below is how both fields are set up in the index:

.Text(smd => smd.Name(c => c.Name).Analyzer(ElasticIndexCreator.SortAnalyzer).Fielddata())

.Nested<Address>(nomd => nomd.Name(p => p.PrimaryAddress).Properties(MapAddressProperties))

//from MapAddressProperties()

.Text(smd2 => smd2.Name(x => x.Line1).Analyzer(ElasticIndexCreator.SortAnalyzer).Fielddata())

Mappings in elastic:

"name": {
    "type": "text",
        "fields": {
            "keyword": {
                "type": "keyword",
                "ignore_above": 256
            }
        }
    }
}
"addresses": {
    "line1": {
        "type": "text",
        "fields": {
            "keyword": {
                "type": "keyword",
                "ignore_above": 256
            }
        }
    },
 }

Is there some other, better way to escape a space in an elasticsearch querystring? I've also tried \\\\ and \\\\\\\\ (in C# evaluates to \\\\ ) instead of the ? to no avail.

Try using addresses.line1.keyword (that is, try the keyword multi-field that you defined for addresses.line1 ) in the fields parameter a term-level wildcard query :

{
    "query": {
        "wildcard": {
            "addresses.line1.keyword": {
                "wildcard": "*4800 street*"
            }
        }
    }
}

Per Elasticsearch documentation on full-text wildcard searches , if you search against addresses.line1 (whose type is text so full-text search rules apply), the search will be performed against each term analyzed out of the field, that is, once against 4800 and again against street , none of which would match your *4800?street* wildcard. The addresses.line1.keyword multi-field contains the original 4800 street value, and should match your query pattern using a term-level wildcard query.


By the way, a minor nit: the mapping type definition itself seems incomplete for the addresses field. You said it is:

"addresses": {
    "line1": {
        "type": "text",
        "fields": {
            "keyword": {
                "type": "keyword",
                "ignore_above": 256
            }
        }
    },
}

But IMHO it should instead be:

"addresses": {
    "properties": {
        "line1": {
            "type": "text",
            "fields": {
                "keyword": {
                    "type": "keyword",
                    "ignore_above": 256
                }
            }
        },
    }
}

Finally found the correct setup after tons of time experimenting. The configuration that worked for me was as follows:

  1. Use Text with Field Data in the columns
  2. Search using QueryString with wildcard placeholders, replacing spaces with ? eg bob smith is entered, query elastic with *bob?smith*
  3. Use Nested queries for child objects . Oddly, addresses.line1 will return data when searching for say 4800 but not when trying to do *4800?street* . Using a nested query allows this to function properly .

From what I hear, having to use field data is very memory intensive, and having to use wildcards is very time intensive, so this is probably not an optimal solution but it's the only one I've found. If there is another better way to do this, please enlighten me.

Example queries in C# using Nest:

var query = Query<Student>.QueryString(qs => qs
                .Fields(f => f
                .Field(c => c.Name)
                //.Field(c => c.PrimaryAddress.Line1) //this doesn't work
                )
                .Query(testCase.Term)
            );

query |= Query<Student>.Nested(n => n
        .Path(p => p.Addresses)
            .Query(q => q
                .QueryString(qs => qs
                        .Fields(f => f.Field(c => c.Addresses.First().Line1))
                        .Query(testCase.Term)
                    )
            )
        );

Example Mapping:

.Map<Student>(s => s.Properties(p => p
    .Text(t => t.Name(z => z.Name).Fielddata())
    .Nested<StudentAddress>(n => n
        .Name(ap => ap.Addresses)
        .Properties(ap => ap.Text(t => t.Name(z => z.Line1).Fielddata())
    )
))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM