简体   繁体   中英

MongoDB 4.4, Java driver 4.2.3 - InsertManyResult.getInsertedIds() not returning IDs for all inserted documents

I am trying to retrieve values of _id for inserted documents after successful InsertMany operation. To achieve this I am using InsertManyResult.getInsertedIds() . While this approach works most of the time there are cases where not all _id values are retrieved.

I am not sure if I am doing something wrong but I would assume that InsertManyResult.getInsertedIds() returns _id for all the documents inserted.

Problem details

I am inserting 1000 documents in MongoDB in two batches of 500 documents. Each document is approx 1 MB in size.

After batch is inserted using InsertMany I attempt to read values of _id via InsertManyResult.getInsertedIds() and save it to a collection for later use.

I would assume that after inserting 500 documents via InsertMany the InsertManyResult.getInsertedIds() would return 500 _id values. It is however returning only 16 _id values out of 500 .

When I check the Mongo collection directly via Mongo Shell I see that all records were successfully inserted. There is 1000 documents in my test collection. I am just unable to get the _id of all the inserted document via InsertManyResult.getInsertedIds() . I only get 32 _id for 1000 documents inserted.

JSON structure

To replicate the issue I have exactly one JSON which is approx 1 MB in size which looks like this.

{
  "textVal" : "RmKHtEMMzJDXgEApmWeoZGRdZJZerIj1",
  "intVal" : 161390623,
  "longVal" : "98213019054010317",
  "timestampVal" : "2020-12-31 23:59:59.999",
  "numericVal" : -401277306,
  "largeArrayVal" : [ "MMzJDXg", "ApmWeoZGRdZJZerI", "1LhTxQ", "adprPSb1ZT", ..., "QNLkBZuXenmYE77"]

}

Note that key largeArrayVal is holding almost all the data. I have omitted most of the values for readability.

Sample code

The code below parses JSON shown above into a Document which is then inserted to MongoDB via InsertMany . After that is done I try to get inserted _id using InsertManyResult.getInsertedIds() .

private static final int MAX_DOCUMENTS = 1000;
private static final int BULK_SIZE = 500;

private static List<ObjectId> insertBatchReturnIds(List<Document> insertBatch)
{
  List<ObjectId> insertedIds = new ArrayList<ObjectId>();
  InsertManyResult insertManyResult;

  insertManyResult = mongoClient.getDatabase(MONGO_DATABASE).getCollection(MONGO_COLLECTION).insertMany(insertBatch);
  insertManyResult.getInsertedIds().forEach((k,v) -> insertedIds.add(v.asObjectId().getValue()));

  System.out.println("Batch inseted:");
  System.out.println(" - Was acknowladged: " + Boolean.toString(insertManyResult.wasAcknowledged()).toUpperCase());
  System.out.println(" - InsertManyResult.getInsertedIds().size(): " + insertManyResult.getInsertedIds().size());

  return insertedIds;
}

private static void insertDocuments()
{
  int documentsInserted = 0;
  List<Document> insertBatch = new ArrayList<Document>();
  List<ObjectId> insertedIds = new ArrayList<ObjectId>();
  final String largeJson = loadLargeJsonFromFile("d:\\test-sample.json");

  System.out.println("Starting INSERT test...");
  while (documentsInserted < MAX_DOCUMENTS)
  {
    insertBatch.add(Document.parse(largeJson));
    documentsInserted++;

    if (documentsInserted % BULK_SIZE == 0)
    {
     insertedIds.addAll(insertBatchReturnIds(insertBatch));
     insertBatch.clear();
    }
  }
  if (insertBatch.size() > 0)
    insertedIds.addAll(insertBatchReturnIds(insertBatch));
  System.out.println("INSERT test finished");

  System.out.println(String.format("Expected IDs retrieved: %d. Actual IDs retrieved: %d.", MAX_DOCUMENTS, insertedIds.size()));
  if (insertedIds.size() != MAX_DOCUMENTS)
    throw new IllegalStateException("Not all _ID were returned for each document in batch");
}

Sample output

Starting INSERT test...
Batch inseted:
 - Was acknowladged: TRUE
 - InsertManyResult.getInsertedIds().size(): 16
Batch inseted:
 - Was acknowladged: TRUE
 - InsertManyResult.getInsertedIds().size(): 16
INSERT test finished
Expected IDs retrieved: 1000. Actual IDs retrieved: 32.
Exception in thread "main" java.lang.IllegalStateException: Not all _ID were returned for each document in batch

My questions

  1. Is InsertManyResult.getInsertedIds() meant to return _id for all documents inserted?
  2. Is the way I am using InsertManyResult.getInsertedIds() correct?
  3. Could size of the inserted JSON be a factor here?
  4. How should I use InsertManyResult to get _id for inserted documents?

Note

I am aware that I can either read _id after Document.parse as it is the driver that generates this or I can select _id after documents were inserted.
I would like to know how can this be achieved using InsertManyResult.getInsertedIds() as it seems to be made to fit this purpose.

Your documents are 1 mb large, hence no more than 16 of them fit into a single command. The driver does split the full set of documents into batches automatically but you appear to be reading ids from one batch at a time, therefore the problem is likely one of the following:

  • There is a driver issue where it doesn't merge the batch results together prior to returning the results to your application
  • The driver is giving you the results one batch at a time, hence you do get all of the ids but not in the segments you were expecting (in which case there is no bug but you do need to work with batches as they are provided by the driver)

The following test in Ruby works as expected, producing 100 ids:

c = Mongo::Client.new(['localhost:14920'])

docs = [{a: 'x'*1_000_000}]*100
res = c['foo'].insert_many(docs)

p res.inserted_ids.length
pp res.inserted_ids

This is a bug in the Java driver, and it's being tracked in https://jira.mongodb.org/browse/JAVA-4436 (reported on January 5, 2021).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM