简体   繁体   中英

Elasticsearch Java API search with regex

I would like to get some data from my Elasticsearch server locally, with help of "regexpQuery", and for this I made the following method :

public void getProductsStartingWithString() throws ParseException {

    Client client = getClient();

    SearchResponse response = null;

    BoolQueryBuilder boolQuery = QueryBuilders.boolQuery()
            .must(QueryBuilders.regexpQuery("ProductCode", "FA.*"));

    if (response == null || response.getHits().hits().length != 0) {

        response = client.prepareSearch("warehouse-550")
                .setTypes("core2")
                .setQuery(boolQuery)
                .setSize(100)
                .execute()
                .actionGet();
    }

    response.getHits();

}

The documents in Elasticsearch looks like this :

{
  "_index": "warehouse-550",
  "_type": "core2",
  "_id": "AVOKD0Pq8h4KkDGZwBom",
  "_score": null,
  "_source": {
    "message": "3,550,162.06,FALK0011927540Y,2016-03-16;08:00:00.000\r",
    "@version": "1",
    "@timestamp": "2016-03-16T07:00:00.000Z",
    "path": "D:/Programs/Logstash/x_testingLocally/processed-stocklevels-550-42190516032016.csv",
    "host": "EVO385",
    "type": "core2",
    "Quantity": 3,
    "Warehouse": "550",
    "Price": 162.06,
    "ProductCode": "FALK0011927540Y",
    "Timestamp": "2016-03-16;08:00:00.000"
  },
  "fields": {
    "@timestamp": [
      1458111600000
    ]
  },
  "sort": [
    1458111600000
  ]
}

But on response, I get always 0 hits.

The output of : curl -XGET "172.22.130.189:9200/warehouse-550/_mapping/core2?pretty" :

{
  "warehouse-550" : {
    "mappings" : {
      "core2" : {
        "properties" : {
          "@timestamp" : {
            "type" : "date",
            "format" : "strict_date_optional_time||epoch_millis"
          },
          "@version" : {
            "type" : "string"
          },
          "Price" : {
            "type" : "double"
          },
          "ProductCode" : {
            "type" : "string"
          },
          "Quantity" : {
            "type" : "long"
          },
          "Timestamp" : {
            "type" : "string"
          },
          "Warehouse" : {
            "type" : "string"
          },
          "host" : {
            "type" : "string"
          },
          "message" : {
            "type" : "string"
          },
          "path" : {
            "type" : "string"
          },
          "type" : {
            "type" : "string"
          }
        }
      }
    }
  }
}

What I do wrong ?

the default analyzer for ProductCode is standard analyzer

If we were to reimplement the standard analyzer as a custom analyzer, it would be defined as follows:

{
    "type":      "custom",
    "tokenizer": "standard",
    "filter":  [ "lowercase", "stop" ]
}

If you notice the "lowercase" filter, it converts the text to lowercase.

FALK0011927540Y gets converted to falk0011927540y

Hence, when you are searching for "FA.*" , there is no match.

Solution :

  • Search by lowercasing on your client side. For eg. "fa.*"

  • map your ProductCode as not_analyzed . It will store it as it is.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM