简体   繁体   English

依次读取和修改node.js中的数据

[英]Read and modify data sequentially in node.js

I'm trying to read and modify some csv files.我正在尝试读取和修改一些 csv 文件。 For one csv file, the code I wrote is the following:对于一个 csv 文件,我写的代码如下:

const csv = require('csv-parser');
const fs = require('fs');
const results = [];
const resultsFiltered = [];

fs.createReadStream('csvTest/2005.csv')
    .pipe(csv())
    .on('data', (data) => results.push(data))
    .on('end', () => {
        // Filter results
        for (i=0; i<results.length; i++) {
            if (results[i]['Points:2'] == 0) {
                resultsFiltered.push([results[i]['Points:0'], results[i]['Points:1'], results[i]['displacement:2']]); 
            }  
        } 
        console.log('results filtered: ', resultsFiltered);    
    });

This works fine, but when I try to loop over several files, I get strange results.这工作正常,但是当我尝试遍历多个文件时,我得到了奇怪的结果。 Here is the code:这是代码:

const csv = require('csv-parser');
const fs = require('fs');

const dataFolder = 'csvTest/';

let results, resultsFiltered, stringified;

function filterData(_file) {
    results = [];
    resultsFiltered = [];
    console.log('filtering');

    fs.createReadStream(dataFolder + _file)
    .pipe(csv())
    .on('data', (data) => results.push(data))
    .on('end', () => {
        // Filter results
        for (i=0; i<results.length; i++) {
            if (results[i]['Points:2'] == 0) {
                resultsFiltered.push([results[i]['Points:0'], results[i]['Points:1'], results[i]['displacement:2']]); 
            }  
        } 
        console.log('done');
        return resultsFiltered;
    });
}

const filesList = fs.readdirSync(dataFolder);

function main() {
    for (i=0; i<filesList.length; i++) {
        console.log(filterData(filesList[i]));
    }
}

main();

I understand that it could be solved with async/await but all the ways that I tried using it were unsuccessful.我知道它可以用 async/await 解决,但我尝试使用它的所有方法都不成功。 I always get in the console the same output as the basic code above:我总是在控制台中获得与上面的基本代码相同的 output:

filtering
undefined
filtering
undefined
filtering
undefined
done
done
done

instead of the desired而不是想要的

filtering
done
resultsFiltered
filtering
done
resultsFiltered
filtering
done
resultsFiltered

How should async/await be used in this case?在这种情况下应该如何使用 async/await?

You get undefined because your function has no return statement.你得到 undefined 因为你的 function 没有返回声明。 The return statement you use is actually in an callback function if you take a closer look.你用的return语句仔细看其实是在一个回调function中。 What you can do is to promisfy your function and use async await.您可以做的是承诺您的 function 并使用异步等待。

const csv = require("csv-parser");
const fs = require("fs");

const dataFolder = "csvTest/";

let results, resultsFiltered, stringified;

function filterData(_file) {
  return new Promise(res => {
    results = [];
    resultsFiltered = [];
    console.log("filtering");

    fs.createReadStream(dataFolder + _file)
      .pipe(csv())
      .on("data", data => results.push(data))
      .on("end", () => {
        // Filter results
        for (i = 0; i < results.length; i++) {
          if (results[i]["Points:2"] == 0) {
            resultsFiltered.push([
              results[i]["Points:0"],
              results[i]["Points:1"],
              results[i]["displacement:2"]
            ]);
          }
        }
        console.log("done");
        res(resultsFiltered);
      });
  });
}

const filesList = fs.readdirSync(dataFolder);

async function main() {
  for (const file of filesList) {
    let result = await filterData(file);
    console.log(result);
  }
}

main();

Well, this should work.好吧,这应该有效。

There are multiple problems in this code that can lead to problems.这段代码中有多个问题可能会导致问题。

The main logic issue is that these streams are all asynchronous and your code is not keeping track of when any of these streams are done reading.主要的逻辑问题是这些流都是异步的,您的代码不会跟踪这些流中的任何一个何时完成读取。 Your return resultsFiltered;您的return resultsFiltered; is inside the .on('end', ...) callback so you're not returning anything from your filterData() function (thus why you get undefined ), but are just returning from that callback which goes nowhere..on('end', ...)回调中,因此您不会从filterData() function 返回任何内容(因此您会得到undefined ),而只是从那个无处可去的回调返回。

Thus, when you loop over these streams, you have lots of them going at once and have no idea when they are all done and no way to get all the data out of them.因此,当您遍历这些流时,您会同时看到很多流,并且不知道它们何时全部完成,也无法从中获取所有数据。

You also have issues with undeclared local variables which will cause them to be accidental globals and they can conflict with each other.您还遇到未声明的局部变量的问题,这将导致它们成为意外的全局变量并且它们可能相互冲突。

To solve the main logic issue, this is a useful spot to introduce a promise that will keep track of when each stream is done and let you get the data out as the resolved value of the promise or get an error out as the reject reason:为了解决主要的逻辑问题,这是引入 promise 的一个有用的地方,它将跟踪每个 stream 何时完成,并让您将数据作为 promise 的解析值输出,或者将错误作为拒绝原因输出:

function filterData(_file) {
    const results = [];
    const resultsFiltered = [];

    return new Promise((resolve, reject) => {
        fs.createReadStream(dataFolder + _file)
            .pipe(csv())
            .on('data', (data) => results.push(data))
            .on('end', () => {
                // Filter results
                for (let i = 0; i < results.length; i++) {
                    if (results[i]['Points:2'] == 0) {
                        resultsFiltered.push([results[i]['Points:0'], results[i]['Points:1'], results[i]['displacement:2']]);
                    }
                }
                resolve(resultsFiltered);
            }).on('error', reject);
    });
}

Then, you can use that promise to keep track of things in your loop:然后,您可以使用 promise 来跟踪循环中的内容:

const filesList = fs.readdirSync(dataFolder);

async function main() {
    for (let i = 0; i < filesList.length; i++) {
        const result = await filterData(filesList[i]);
        console.log(result);
    }
}

main().then(() => {
    console.log("done");
}).catch(err => {
    console.log(err);
});

Note, this has several other fixes to your code:请注意,这对您的代码有其他几个修复

  1. It adds an error event handler on your stream which rejects the promise and thus gives you an opportunity to track and handle errors.它在您的 stream 上添加了一个error事件处理程序,它拒绝 promise,从而让您有机会跟踪和处理错误。 Your original code was ignoring errors.您的原始代码忽略了错误。

  2. This adds local declarations with const or let for all your variables which is absolutely required.这会为所有绝对需要的变量添加带有constlet的局部声明。 Your code was not declaring either i variable used in your loops which will cause them to be implicit variables and potentially conflict with one another.您的代码没有声明循环中使用的i变量,这将导致它们成为隐式变量并可能相互冲突。

  3. This sequences the streams so they aren't all in progress at once and it allows you to keep your results in order.这会对流进行排序,因此它们不会同时进行,并且可以让您保持结果有序。 The code would also be written to run in parallel (if enough resources are available to do that) by collecting all the promises and using Promise.all() to track them.通过收集所有承诺并使用Promise.all()来跟踪它们,代码也将被编写为并行运行(如果有足够的资源可用于这样做)。

You will need to convert your operations to return a promise to be able to await the results您需要转换操作以返回 promise 才能等待结果

PS don't forget to handle errors by listening for the error event and rejecting the promise. PS不要忘记通过监听错误事件并拒绝promise来处理错误。

function filterData(_file) {
  results = [];
  resultsFiltered = [];
  console.log('filtering');

  return new Promise((resolve) => {
    fs.createReadStream(dataFolder + _file)
      .pipe(csv())
      .on('data', (data) => results.push(data))
      .on('end', () => {
        // Filter results
        for (i = 0; i < results.length; i++) {
          if (results[i]['Points:2'] == 0) {
            resultsFiltered.push([
              results[i]['Points:0'],
              results[i]['Points:1'],
              results[i]['displacement:2'],
            ]);
          }
        }
        console.log('done');
        resolve(resultsFiltered)
      });
  });
}

const filesList = fs.readdirSync(dataFolder);

async function main() {
  for (i = 0; i < filesList.length; i++) {
    console.log(await filterData(filesList[i]));
  }
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM