简体   繁体   中英

Parse ElasticSearch output with Jackson

I'd like to parse the _source field of an ElasticSearch output. Here's an example of mine (it only contains one list of values):

"_source":
{    
   "key1": "value1",    
   "key2": "value2"
},
{    
   "key1": "value1",    
   "key2": "value2"
},
etc.

I know how to get to _source but I don't know how to parse it. It seems to be a single node, isn't it?

EDIT:

I tried to 'reach' the _source field but it doesn't seem to be working:

final ArrayNode _source = (ArrayNode) jsonNode.path(ES_HITS).path(ES_HITS).path(ES_SOURCE);
for (JsonNode value : _source)
{
        try 
        {
            lov.add(mapper.treeToValue(value, Lov.class));
        } catch (JsonProcessingException e) {   logger.error("GetLibelles : add : error : JsonProcessingException", e); }
        }

Lov class

@JsonIgnoreProperties(ignoreUnknown = true)
public class Lov extends ParentModel implements Serializable
{   
    private String key1;
    private String key2;
    private String key3;
    private String key4;

    // getters and setters
}

The error I'm getting:

com.fasterxml.jackson.databind.node.MissingNode incompatible with com.fasterxml.jackson.databind.node.ArrayNode

The ElasticSearch output:

{
 "took":0,
 "timed_out":false,
 "_shards":
 {
    "total":1,
    "successful":1,
    "failed":0
 },
"hits":
{ 
   "total":1,
   "max_score":1.0,
   "hits":
    [
       {
          "_index":"bla",
          "_type":"lov",
          "_id":"PWA8bmEBRDuys8JUCwg10w",
          "_score":1.0,
          "_source":
          {    
              "key1": "value1",    
              "key2": "value2"
          },
          {    
              "key1": "value1",    
              "key2": "value2"
          }
       } 
    ]
}}

I've found the solution. The mapping was good but the insertion was not. To correctly insert multiple documents, I had to use the Bulk API .

Once the mapping is done, I have to insert my data using the following command:

curl -s -XPOST 'serverAddress/_bulk' --data-binary @data.json; echo

data.json

{ "index" : { "_index" : "yourIndex", "_type" : "lov"}}
{ "key1": "value1", "key2": "value2"}
{ "index" : { "_index" : "yourIndex", "_type" : "lov"}}
{ "key1": "value1", "key2": "value2"}

In the same way that mget allows us to retrieve multiple documents at once, the bulk API allows us to make multiple create, index, update, or delete requests in a single step.

I need to insert my data, therefore I choose the index action. Each request needs an action .

Do no forget:

  1. Every line must end with a newline character (\\n), including the last line. These are used as markers to allow for efficient line separation.
  2. The lines cannot contain unescaped newline characters, as they would interfere with parsing. This means that the JSON must not be pretty-printed.

If your query returns multiple hits, then "_source" property will be present in each of the returned hits. (see here in the documentation )

To parse the json with jackson, just make a POJO that matches the json schema. In your case, this should be class (Result.java) that contains the both properties key1 and key2. Then map the json string to your pojo class with the jackson ObjectMapper:

ObjectMapper mapper = new ObjectMapper();
Result result = mapper.readValue("{\"key1\":\"value1\",..}",result.class);

Behind the "_source" property should normally be only one object, I suppose. Is the code you provided from the real use case, or just an example?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM