简体   繁体   中英

Bulk update MongoDB Collection from a CSV file

I have a mongoose schema defined as

const masterSchema = new mongoose.Schema({
  chapter: { type: Number, required: true },
  line: { type: Number, required: true },
  translations: [
    {
      translation: { type: String, required: true },
    },
  ],
});

I am trying to update the collection from a CSV file. The Collection has more than 5K documents.

Sample data

[
  {
    chapter: 1,
    line: 1,
    translations: [
      {
        translation: "xyz",
      },
    ],
  },
  {
    chapter: 1,
    line: 2,
    translations: [
      {
        translation: "abc",
      },
    ],
  },
];

CSV file has a format of

chapter,line,translation
1,1,example1
1,2,example2
....

The output should be

[
  {
    chapter: 1,
    line: 1,
    translations: [
      {
        translation: "xyz",
      },
      {
        translation : "example1"
      }
    ],
  },
  {
    chapter: 1,
    line: 2,
    translations: [
      {
        translation: "abc",
      },
      {
        translation : "example2"
      }
    ],
  },
]

I am confused about how updateMany() will be used to insert the data into the correct document. ( if it the correct way to solve the problem )

Assuming that chapter + line is not unique, then really the hardest part of this exercise is parsing the CSV in the first place. CSV tends to throw surprises at you like quoted material, unintended leading and trailing whitespace in parsed columns, etc., so it is good to use some programming around it to control it, eg

import csv
from pymongo import MongoClient

client = MongoClient('mongodb://localhost:27017')
db = client['testX']
coll = db['foo']

with open('myFile', 'r') as csvfile:
    reader = csv.reader(csvfile)
    header = next(reader, None)  # capture and skip header line
    for row in reader:
        print(row)  # for fun
        # Careful: must turn parsed strings into int to match database type.
        # Also, strip whitespace from col 2 for safety:
        coll.update_many({'chapter':int(row[0]),'line':int(row[1])},
                         {'$push': {'translations':{'translation':row[2].strip()}}})

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM