简体   繁体   English

从 CSV 文件批量更新 MongoDB 集合

[英]Bulk update MongoDB Collection from a CSV file

I have a mongoose schema defined as我有一个 mongoose 架构定义为

const masterSchema = new mongoose.Schema({
  chapter: { type: Number, required: true },
  line: { type: Number, required: true },
  translations: [
    {
      translation: { type: String, required: true },
    },
  ],
});

I am trying to update the collection from a CSV file.我正在尝试从 CSV 文件更新集合。 The Collection has more than 5K documents.该集合有超过 5000 份文档。

Sample data样本数据

[
  {
    chapter: 1,
    line: 1,
    translations: [
      {
        translation: "xyz",
      },
    ],
  },
  {
    chapter: 1,
    line: 2,
    translations: [
      {
        translation: "abc",
      },
    ],
  },
];

CSV file has a format of CSV 文件的格式为

chapter,line,translation
1,1,example1
1,2,example2
....

The output should be output 应该是

[
  {
    chapter: 1,
    line: 1,
    translations: [
      {
        translation: "xyz",
      },
      {
        translation : "example1"
      }
    ],
  },
  {
    chapter: 1,
    line: 2,
    translations: [
      {
        translation: "abc",
      },
      {
        translation : "example2"
      }
    ],
  },
]

I am confused about how updateMany() will be used to insert the data into the correct document.我对如何使用updateMany()将数据插入正确的文档感到困惑。 ( if it the correct way to solve the problem ) 如果它是解决问题的正确方法

Assuming that chapter + line is not unique, then really the hardest part of this exercise is parsing the CSV in the first place.假设chapter + line不是唯一的,那么这个练习最难的部分首先是解析 CSV。 CSV tends to throw surprises at you like quoted material, unintended leading and trailing whitespace in parsed columns, etc., so it is good to use some programming around it to control it, eg CSV 往往会给您带来惊喜,例如引用的材料、已分析列中意外的前导和尾随空格等,因此最好围绕它使用一些编程来控制它,例如

import csv
from pymongo import MongoClient

client = MongoClient('mongodb://localhost:27017')
db = client['testX']
coll = db['foo']

with open('myFile', 'r') as csvfile:
    reader = csv.reader(csvfile)
    header = next(reader, None)  # capture and skip header line
    for row in reader:
        print(row)  # for fun
        # Careful: must turn parsed strings into int to match database type.
        # Also, strip whitespace from col 2 for safety:
        coll.update_many({'chapter':int(row[0]),'line':int(row[1])},
                         {'$push': {'translations':{'translation':row[2].strip()}}})

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM