Am reading 100k
plus file path from the index documents_qa
using scroll API. Actual files will be available in my local d:\\drive
. By using the file path am reading the actual file and converting into base64 and am reindex with the base64 content (of a file) in another index document_attachment_qa
.
My current implementation is, am reading filePath, convering the file into base64 and indexing document along with fileContent one by one. So its taking more time for eg:- indexing 4000 documents its taking more than 6 hours and also connection is terminating due to IO exception
.
So now i want to index the documents using BulkRequest API, but am using RestHighLevelClient and am not sure how to using BulkRequest
API along with RestHighLevelClient
.
Please find my current implementation, which am indexing one by one document.
jsonMap = new HashMap<String, Object>();
jsonMap.put("id", doc.getId());
jsonMap.put("app_language", doc.getApp_language());
jsonMap.put("fileContent", result);
String id=Long.toString(doc.getId());
IndexRequest request = new IndexRequest(ATTACHMENT, "doc", id ) // ATTACHMENT is the index name
.source(jsonMap) // Its my single document.
.setPipeline(ATTACHMENT);
IndexResponse response = SearchEngineClient.getInstance3().index(request); // increased timeout
I found the below documentation for BulkRequest
.
https://www.elastic.co/guide/en/elasticsearch/client/java-api/current/java-docs-bulk.html
But am not sure how to implement BulkRequestBuilder bulkRequest = client.prepareBulk();
client.prepareBulk() method when and using RestHighLevelClient
.
UPDATE 1
Am trying to indexing all 100K
documents in one shot. so i creating one JSONArray and put all my JSONObject into the array one by one. Finally am trying to build BulkRequest
and add all my documents (JSONArray) as a source to the BulkRequest and trying to index them.
Here am not sure, how to convert my JSONArray to List of String.
private final static String ATTACHMENT = "document_attachment_qa";
private final static String TYPE = "doc";
JSONArray reqJSONArray=new JSONArray();
while (searchHits != null && searchHits.length > 0) {
...
...
jsonMap = new HashMap<String, Object>();
jsonMap.put("id", doc.getId());
jsonMap.put("app_language", doc.getApp_language());
jsonMap.put("fileContent", result);
reqJSONArray.put(jsonMap)
}
String actionMetaData = String.format("{ \"index\" : { \"_index\" : \"%s\", \"_type\" : \"%s\" } }%n", ATTACHMENT, TYPE);
List<String> bulkData = // not sure how to convert a list of my documents in JSON strings
StringBuilder bulkRequestBody = new StringBuilder();
for (String bulkItem : bulkData) {
bulkRequestBody.append(actionMetaData);
bulkRequestBody.append(bulkItem);
bulkRequestBody.append("\n");
}
HttpEntity entity = new NStringEntity(bulkRequestBody.toString(), ContentType.APPLICATION_JSON);
try {
Response response = SearchEngineClient.getRestClientInstance().performRequest("POST", "/ATTACHMENT/TYPE/_bulk", Collections.emptyMap(), entity);
return response.getStatusLine().getStatusCode() == HttpStatus.SC_OK;
} catch (Exception e) {
// do something
}
You can just new BulkRequest()
and add the requests without using BulkRequestBuilder
, like:
BulkRequest request = new BulkRequest();
request.add(new IndexRequest("foo", "bar", "1")
.source(XContentType.JSON,"field", "foobar"));
request.add(new IndexRequest("foo", "bar", "2")
.source(XContentType.JSON,"field", "foobar"));
...
BulkResponse bulkResponse = myHighLevelClient.bulk(request, RequestOptions.DEFAULT);
In addition to @chengpohi answer. I would like to add below points:
A BulkRequest can be used to execute multiple index, update and/or delete operations using a single request.
It requires at least one operation to be added to the Bulk request:
BulkRequest request = new BulkRequest();
request.add(new IndexRequest("posts", "doc", "1")
.source(XContentType.JSON,"field", "foo"));
request.add(new IndexRequest("posts", "doc", "2")
.source(XContentType.JSON,"field", "bar"));
request.add(new IndexRequest("posts", "doc", "3")
.source(XContentType.JSON,"field", "baz"));
Note: The Bulk API supports only documents encoded in JSON or SMILE. Providing documents in any other format will result in an error.
Synchronous Operation:
BulkResponse bulkResponse = client.bulk(request, RequestOptions.DEFAULT);
client will be High-Level Rest Client and execution will be synchronous.
Asynchronous Operation(Recommended Approach):
client.bulkAsync(request, RequestOptions.DEFAULT, listener);
The asynchronous execution of a bulk request requires both the BulkRequest instance and an ActionListener instance to be passed to the asynchronous method.
Listener Example:
ActionListener<BulkResponse> listener = new ActionListener<BulkResponse>() {
@Override
public void onResponse(BulkResponse bulkResponse) {
}
@Override
public void onFailure(Exception e) {
}
};
The returned BulkResponse contains information about the executed operations and allows to iterate over each result as follows:
for (BulkItemResponse bulkItemResponse : bulkResponse) {
DocWriteResponse itemResponse = bulkItemResponse.getResponse();
if (bulkItemResponse.getOpType() == DocWriteRequest.OpType.INDEX
|| bulkItemResponse.getOpType() == DocWriteRequest.OpType.CREATE) {
IndexResponse indexResponse = (IndexResponse) itemResponse;
} else if (bulkItemResponse.getOpType() == DocWriteRequest.OpType.UPDATE) {
UpdateResponse updateResponse = (UpdateResponse) itemResponse;
} else if (bulkItemResponse.getOpType() == DocWriteRequest.OpType.DELETE) {
DeleteResponse deleteResponse = (DeleteResponse) itemResponse;
}
}
The following arguments can optionally be provided:
request.timeout(TimeValue.timeValueMinutes(2));
request.timeout("2m");
I hope this helps.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.