简体   繁体   中英

Adding a New Element/Field with an Increment Integer as Value

After using Mongoimport to import a CSV file to my database, I want to add a new field or element per document. And, the data per for this new field is the is the index number plus 2.

Dim documents = DB.GetCollection(Of BsonDocument)(collectionName).Find(filterSelectedDocuments).ToListAsync.Result


For Each doc in documents
    DB.GetCollection(Of BsonDocument)(collectionName).UpdateOneAsync(
        Builders(Of BsonDocument).Filter.Eq(Of ObjectId)("_id", doc.GetValue("_id").AsObjectId),
        Builders(Of BsonDocument).Update.Set(Of Integer)("increment.value", documents.IndexOf(doc) + 2).Wait()
Next

If I have over a million of data to import, is there a better way to achieved this like using UpdateManyAsync ?

Just as a side note: Since you've got the Wait() and the Result everywhere, the Async methods don't seem to make an awful lot of sense. Also, your logic appears flawed since there is no .Sort() anywhere. So you've got no guarantee about the order of your returned documents. Is it indended that every document just gets a kind of random but unique and increasing number assigned?

Anyway, to make this faster, you'd really want to patch your CSV file and write the increasing "increment.value" field straight into it before the import. This way, you've got your value directly in MongoDB and do not need to query and update the imported data again.

If this is not an option you could optimize your code like this:

  1. Only retrieve the _id of your documents - that's all you need and it will majorly impact your .find() perfomance since a lot less data needs to be transferred/deserialized from MongoDB.
  2. Iterate over the Enumerable of your result instead of using a fully populated list.
  3. Use bulk writes to avoid connecting to MongoDB again and again for every document and use a chunked flushing approach and flush every 1000 documents or so.
  4. Theoretically, you could go further using multithreading or yield semantics for nicer streaming. However, that's getting a little complicated and may not even be needed.

The following should get you going faster already:

' just some cached values
Dim filterDefinitionBuilder  = Builders(Of BsonDocument).Filter
Dim updateDefinitionBuilder  = Builders(Of BsonDocument).Update
Dim collection = DB.GetCollection(Of BsonDocument)(collectionName)

' load only _id field
Dim documentIds = collection.Find(filterSelectedDocuments).Project(Function(doc) doc.GetValue("_id")).ToEnumerable()

' bulk write buffer (pre-initialized to size 1000 to avoid memory traffic upon array expansion)
Dim updateModelsBuffer = new List(Of UpdateOneModel(Of BsonDocument))(1000)

' starting value for our update counter
Dim i As Long = 2 

For Each objectId In documentIds
    ' for every document we want one update command...
    ' ...that finds exactly one document identified by its _id field
    Dim filterDefinition  = filterDefinitionBuilder.Eq(Of ObjectId)("_id", objectId)
    ' ...and updates the "increment.value" with our running counter
    Dim updateDefinition  = updateDefinitionBuilder.Set(Of Integer)("increment.value", i)

    updateModelsBuffer.Add(New UpdateOneModel(Of BsonDocument)(filterDefinition, updateDefinition))

    ' every e.g. 1000 documents
    If updateModelsBuffer.Count = 1000
        ' we flush the contents to the database
        collection.BulkWrite(updateModelsBuffer)
        ' and we empty our buffer list
        updateModelsBuffer.Clear()
    End If
    i = i + 1
Next

' flush left over commands that have not been written yet in case we do not have a multiple of 1000 documents
collection.BulkWrite(updateModelsBuffer)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM