简体   繁体   中英

Elasticsearch : Retrieve long text field from a document

I have a document indexed in ES. This document have 3 text fields F1 , F2 and F3 .

When I try to search this document using the Java API, I only have the value for fields F1 and F2 , and field F3 appears empty.

QueryBuilder query =  //Some query

SearchResponse response = client.prepareSearch(index)
                .addDocValueField("F1.keyword")
                .addDocValueField("F2.keyword")
                .addDocValueField("F3.keyword")
                .setQuery(query)
                .execute()
                .actionGet();

SearchHit hit = response.getHits().getAt(0);

System.out.println("F1 : "+hit.getField("F1.keyword").getValue());
System.out.println("F2 : "+hit.getField("F2.keyword").getValue());
System.out.println("F3 : "+hit.getField("F3.keyword").getValue()); // empty

My field F3 can be very long. In the document I use for the tests, it does >300 characters, and can be way longer.

My index mapping is :

"mappings": {
      "MyIndex": {
        "properties": {
          "F1": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "F2": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "F3": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          }
        }
      }
    }

So I updated the ignore_above field in mapping for F3 up to 20000 (might be a bad idea ?), but I still have the same behavior.

What is the problem, and what is the correct way to do this ?

Notes :

  • Using ES 5.6.3
  • I don't need to do any analysis / term search on the field F3 , only retrieving it's value when a query match F1 or F2 .
  • I will have a small number of document of this kind, so efficiency would not be a big issue

EDIT :

Strange thing is that I have the expected result when I request elasticsearch with my browser with query :

http://localhost:9200/MyIndex/_search?pretty=true?{"query": {"match_all": {}}}

In Elasticsearch, the default behavior maps text strings into two different Elasticsearch types: text and keyword . They are different things, used for different purposes, mainly it's that text is a full text search field, while keyword is like a structured constant value. Read more in the docs

In your case, the default inclusion of the keyword field doesn't look helpful. In your queries, you should just grab the "regular" F3 field, and/or the regular fields for F1 and F2 also.

Last, I'm not super familiar with the ES Java client, but if you want to do source filtering (ie get only a subset of values back from your request), I don't think addDocValueField() is right. Check out: https://www.elastic.co/guide/en/elasticsearch/client/java-rest/5.6/java-rest-high-search.html#_source_filtering

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM