[英]Bufferizing data from stream in nodeJS for perfoming bulk insert
如何在 nodeJS 中对来自流的事件进行有效缓冲以进行批量插入,而不是从流中接收到的每条记录的唯一插入。 这是我想到的伪代码:
// Open MongoDB connection
mystream.on('data', (record) => {
// bufferize data into an array
// if the buffer is full (1000 records)
// bulk insert into MongoDB and empty buffer
})
mystream.on('end', () => {
// close connection
})
这看起来现实吗? 有没有可能的优化? 现有的图书馆有哪些便利?
使用 NodeJS 的stream
库,这可以简洁有效地实现为:
const stream = require('stream');
const util = require('util');
const mongo = require('mongo');
const streamSource; // A stream of objects from somewhere
// Establish DB connection
const client = new mongo.MongoClient("uri");
await client.connect();
// The specific collection to store our documents
const collection = client.db("my_db").collection("my_collection");
await util.promisify(stream.pipeline)(
streamSource,
stream.Writable({
objectMode: true,
highWaterMark: 1000,
writev: async (chunks, next) => {
try {
const documents = chunks.map(({chunk}) => chunk);
await collection.insertMany(docs, {ordered: false});
next();
}
catch( error ){
next( error );
}
}
})
);
我最终得到了一个无依赖的解决方案。
const { MongoClient } = require("mongodb")
const url = process.env.MONGO_URI || "mongodb://localhost:27019";
const connection = MongoClient.connect(url, { useNewUrlParser: true, useUnifiedTopology: true })
Promise.resolve(connection)
.then((db) => {
const dbName = "databaseName";
const collection = 'collection';
const dbo = db.db(dbName);
let buffer = []
stream.on("data", (row: any) => {
buffer.push(row)
if (buffer.length > 10000) {
dbo.collection(collection).insertMany(buffer, {ordered: false});
buffer = []
}
});
stream.on("end", () => {
// insert last chunk
dbo.collection(collection).insertMany(buffer, {ordered: false})
.then(() => {
console.log("Done!");
db.close();
})
});
sas_stream.on("error", (err) => console.log(err));
})
.catch((err) => {
console.log(err)
})
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.