简体   繁体   English

将 csv-parse 输出保存到变量

[英]Save csv-parse output to a variable

I'm new to using csv-parse and this example from the project's github does what I need with one exception.我是使用 csv-parse 的新手,这个来自项目 github 的示例满足了我的需求,但有一个例外。 Instead of outputting via console.log I want to store data in a variable.我想将数据存储在变量中,而不是通过 console.log 输出。 I've tried assigning the fs line to a variable and then returning data rather than logging it but that just returned a whole bunch of stuff I didn't understand.我试过将 fs 行分配给一个变量,然后返回data而不是记录它,但这只是返回了一大堆我不明白的东西。 The end goal is to import a CSV file into SQLite.最终目标是将 CSV 文件导入 SQLite。

var fs = require('fs');
var parse = require('..');

var parser = parse({delimiter: ';'}, function(err, data){
  console.log(data);
});

fs.createReadStream(__dirname+'/fs_read.csv').pipe(parser);

Here is what I have tried:这是我尝试过的:

const fs = require("fs");
const parse = require("./node_modules/csv-parse");

const sqlite3 = require("sqlite3");
// const db = new sqlite3.Database("testing.sqlite");

let parser = parse({delimiter: ","}, (err, data) => {
    // console.log(data);
    return data;
});

const output = fs.createReadStream(__dirname + "/users.csv").pipe(parser);
console.log(output);

I was also struggling to figure out how to get the data from csv-parse back to the top-level that invokes parsing.我也在努力弄清楚如何将数据从 csv-parse 返回到调用解析的顶层。 Specifically I was trying to get parser.info data at the end of processing to see if it was successful, but the solution for that can work to get the row data as well, if you need.具体来说,我试图在处理结束时获取 parser.info 数据以查看它是否成功,但是如果需要,该解决方案也可以用于获取行数据。

The key was to wrap all the stream event listeners into a Promise, and within the parser's callback resolve the Promise.关键是将所有流事件侦听器包装到一个 Promise 中,并在解析器的回调中解析 Promise。

function startFileImport(myFile) {

  // THIS IS THE WRAPPER YOU NEED
  return new Promise((resolve, reject) => {

    let readStream = fs.createReadStream(myFile);

    let fileRows = [];
    const parser = parse({
      delimiter: ','
    });

    // Use the readable stream api
    parser.on('readable', function () {
      let record
      while (record = parser.read()) {
        if (record) { fileRows.push(record); }
      }
    });

    // Catch any error
    parser.on('error', function (err) {
      console.error(err.message)
    });

    parser.on('end', function () {
      const { lines } = parser.info;
      // RESOLVE OUTPUT THAT YOU WANT AT PARENT-LEVEL
      resolve({ status: 'Successfully processed lines: ', lines });
    });

    // This will wait until we know the readable stream is actually valid before piping                
    readStream.on('open', function () {
      // This just pipes the read stream to the response object (which goes to the client)
      readStream.pipe(parser);
    });

    // This catches any errors that happen while creating the readable stream (usually invalid names)
    readStream.on('error', function (err) {
      resolve({ status: null, error: 'readStream error' + err });
    });

  });
}

This is a question that suggests confusion about an asynchronous streaming API and seems to ask at least three things. 这是一个让人对异步流API感到困惑的问题,似乎至少要问三件事。

  1. How do I get output to contain an array-of-arrays representing the parsed CSV data? 如何获得包含表示已解析的CSV数据的数组的output

That output will never exist at the top-level, like you (and many other programmers) hope it would, because of how asynchronous APIs operate. 就像您(和许多其他程序员)希望的那样,由于异步API的运行方式,该output将永远不会像您(和其他许多程序员)所希望的那样存在于顶层。 All the data assembled neatly in one place can only exist in a callback function. 整齐地组装在一个地方的所有数据只能存在于回调函数中。 The next best thing syntactically is const output = await somePromiseOfOutput() but that can only occur in an async function and only if we switch from streams to promises. 从语法const output = await somePromiseOfOutput() ,下一个最好的事情是const output = await somePromiseOfOutput()但这只能在async function发生,并且仅当我们从流切换到const output = await somePromiseOfOutput()时才发生。 That's all possible, and I mention it so you can check it out later on your own. 这都是可能的,我提到了它,因此您以后可以自己检查出来。 I'll assume you want to stick with streams. 我假设您要坚持使用流。

An array consisting of all the rows can only exist after reading the entire stream. 由所有行组成的数组只能在读取整个流之后存在。 That's why all the rows are only available in the author's "Stream API" example only in the .on('end', ...) callback. 这就是为什么所有行仅在作者的“ Stream API”示例中仅在.on('end', ...)回调中可用的原因。 If you want to do anything with all the rows present at the same time, you'll need to do it in the end callback. 如果您想同时处理所有存在的行,则需要在end回调中进行。

From https://csv.js.org/parse/api/ note that the author: https://csv.js.org/parse/api/注意到作者:

  1. uses the on readable callback to push single records into a previously empty array defined externally named output . 使用on可读回调将单个记录推入一个外部命名为output空数组。
  2. uses the on error callback to report errors 使用on error回调报告错误
  3. uses the on end callback to compare all the accumulated records in output to the expected result 使用on end回调将输出中的所有累积记录与预期结果进行比较

... const output = [] ... parser.on('readable', function(){ let record while (record = parser.read()) { output.push(record) } }) // Catch any error parser.on('error', function(err){ console.error(err.message) }) // When we are done, test that the parsed output matched what expected parser.on('end', function(){ assert.deepEqual( output, [ [ 'root','x','0','0','root','/root','/bin/bash' ], [ 'someone','x','1022','1022','','/home/someone','/bin/bash' ] ] ) })

  1. As to the goal on interfacing with sqlite, this is essentially building a customized streaming endpoint. 至于与sqlite接口的目标,这实际上是在构建自定义的流终结点。

In this use case, implement a customized writable stream that accepts the output of parser and sends rows to the database. 在此用例中, 实现定制的可写流 ,该将接受解析器的输出并将行发送到数据库。

Then you simply chain pipe calls as 然后您只需将管道调用链接为

fs.createReadStream(__dirname+'/fs_read.csv') .pipe(parser) .pipe(your_writable_stream)

Beware : This code returns immediately. 当心此代码立即返回。 It does not wait for the operations to finish. 它不等待操作完成。 It interacts with a hidden event loop internal to node.js. 它与node.js内部的隐藏事件循环进行交互。 The event loop often confuses new developers who are arriving from another language, used to a more imperative style, and skipped this part of their node.js training. 事件循环经常使来自另一种语言的新开发人员感到困惑,他们习惯了命令式风格,而跳过了他们的node.js培训这一部分。

Implementing such a customized writable stream can get complicated and is left as an exercise for the reader. 实现这样的定制可写流可能会变得复杂,留给读者练习。 It will be easiest if the parser emits a row, and then the writer can be written to handle single rows. 如果解析器发出一行,这将是最简单的,然后可以编写编写器来处理单行。 Make sure you are able to notice errors somehow and throw appropriate exceptions, or you'll be cursed with incomplete results and no warning or reason why. 确保您能够以某种方式注意到错误并抛出适当的异常,否则您将被不完整的结果所困扰,没有警告或原因。

A hackish way to do it would have been to replace console.log(data) in let parser = ... with a customized function writeRowToSqlite(data) that you'll have to write anyway to implement a custom stream. 一种骇人听闻的方法是将let parser = ... console.log(data)替换为自定义函数writeRowToSqlite(data) ,您必须编写该函数才能实现自定义流。 Because of asynchronous API issues, using return data there does not do anything useful. 由于异步API问题,使用return data没有任何用处。 It certainly, as you saw, fails to put the data into the output variable. 如您所见,它肯定无法将数据放入输出变量中。


  1. As to why output in your modified posting does not contain the data... 至于为什么修改后的过帐中的output不包含数据...

Unfortunately, as you discovered, this is usually wrong-headed: 不幸的是,正如您发现的那样,这通常是错误的:

const output = fs.createReadStream(__dirname + "/users.csv").pipe(parser); console.log(output);

Here, the variable output will be a ReadableStream , which is not the same as the data contained in the readable stream. 在这里,变量output将是ReadableStream ,它与可读流中包含的数据不同。 Put simply, it's like when you have a file in your filesystem, and you can obtain all kinds of system information about the file, but the content contained in the file is accessed through a different call. 简而言之,就像文件系统中有文件一样,您可以获得有关文件的各种系统信息,但是文件中包含的内容是通过不同的调用来访问的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM