简体   繁体   English

在Node.js中写入循环中的文件时可能存在数据不一致

[英]Possible data inconsistencies when writing to a file in a loop in Node.js

I have an array with say.. 100000 objects. 我有一个数组说... 100000个对象。 I use the map function and on each iteration, I build a string and write the content to a CSV like so: 我使用map函数,在每次迭代时,我构建一个字符串并将内容写入CSV,如下所示:

  entriesArray.map((entry) => {
    let str = entry.id + ',' + entry.fname + ',' + entry.lname + ',' +
    entry.address + ',' + entry.age + ',' + entry.sex + '\n'
    writeToFile(str);
  });

The writeToFile function: writeToFile函数:

const writeToFile = (str) => {
  fs.appendFile(outputFileName + '.csv', str, (err) => {
    if (err) throw err;
  });
};

This works as expected, but i'm concerned if having so many asynchronous write operations could lead to any data inconsistencies. 这可以按预期工作,但我担心如果有这么多异步写操作可能导致任何数据不一致。 So my question is, is this safe? 所以我的问题是,这样安全吗? Or is there a better way to do it. 或者有更好的方法来做到这一点。

Btw, The same code on a MAC OS threw the error Error: ENFILE: file table overflow, open 'output.csv'. 顺便说一句,MAC OS上的相同代码抛出错误错误:ENFILE:文件表溢出,打开'output.csv'。 On a bit of research, I learned that this is due to OSX having a very low open file limit. 经过一番研究,我了解到这是因为OSX具有非常低的打开文件限制。 More details on this can be found here . 有关这方面的更多细节可以在这里找到。

Again I'm hoping an improvement to my file write mechanism could sort this issue out as well. 我再次希望我的文件写入机制的改进也可以解决这个问题。

You are correct to realize that this is not a good way to code as there are no guarantees of order with asynchronous writing (particularly if the writes are large and may take more than one actual write operation to disk). 你是正确的认识到这不是一个好的代码方法,因为不能保证异步写入的顺序(特别是如果写入很大并且可能需要对磁盘进行多次实际的写操作)。 And, remember that fs.appendfile() actually consists of three asynchronous operations fs.open() , fs.write() and fs.close() . 并且,请记住fs.appendfile()实际上由三个异步操作fs.open()fs.write()fs.close() And, as you have seen, this opens a lot of file handles all at once as it tries to do every single write in parallel. 而且,正如您所看到的,这会同时打开大量文件句柄,因为它会尝试并行执行每一次写入操作。 None of that is necessary. 这些都不是必要的。

I'd suggest you build the text you want to write as a string and do one write at the end as there appears to be no reason to actually write each one separately. 我建议你构建你想要写成字符串的文本,然后在最后写一个,因为似乎没有理由分别写出每一个。 This will also be a lot more efficient: 这也将更有效:

writeToFile(entriesArray.map((entry) => {
    return entry.id + ',' + entry.fname + ',' + entry.lname + ',' +
        entry.address + ',' + entry.age + ',' + entry.sex + '\n';
}).join(""));

Let's say you had 1000 items in your entriesArray . 假设您在entriesArray有1000个项目。 Your scheme was doing 3000 disk operations open, write and close for every single entry. 您的方案是为每个条目打开,写入和关闭3000个磁盘操作。 My suggested code does 3 disk operations. 我建议的代码执行3次磁盘操作。 This should be significantly faster and have a guaranteed write order. 这应该明显更快并且具有保证的写入顺序。


Also, you really need to think about proper error handling. 此外,您真的需要考虑正确的错误处理。 Using something like: 使用类似的东西:

if (err) throw err;

inside an async callback is NOT proper error handling. 在异步回调内部是不正确的错误处理。 That throws into an async event which you have no ability to ever handle. 这引发了一个你无法处理的异步事件。 Here's on scheme: 这是关于计划:

const writeToFile = (str, fn) => {
  fs.appendFile(outputFileName + '.csv', str, (err) => {
    fn(err);
  });
};

writeToFile(entriesArray.map((entry) => {
    return entry.id + ',' + entry.fname + ',' + entry.lname + ',' +
        entry.address + ',' + entry.age + ',' + entry.sex + '\n';
}).join(""), function(err) {
    if (err) {
       // error here
    } else {
       // success here
    }
});

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM