简体   繁体   中英

Save csv-parse output to a variable

I'm new to using csv-parse and this example from the project's github does what I need with one exception. Instead of outputting via console.log I want to store data in a variable. I've tried assigning the fs line to a variable and then returning data rather than logging it but that just returned a whole bunch of stuff I didn't understand. The end goal is to import a CSV file into SQLite.

var fs = require('fs');
var parse = require('..');

var parser = parse({delimiter: ';'}, function(err, data){
  console.log(data);
});

fs.createReadStream(__dirname+'/fs_read.csv').pipe(parser);

Here is what I have tried:

const fs = require("fs");
const parse = require("./node_modules/csv-parse");

const sqlite3 = require("sqlite3");
// const db = new sqlite3.Database("testing.sqlite");

let parser = parse({delimiter: ","}, (err, data) => {
    // console.log(data);
    return data;
});

const output = fs.createReadStream(__dirname + "/users.csv").pipe(parser);
console.log(output);

I was also struggling to figure out how to get the data from csv-parse back to the top-level that invokes parsing. Specifically I was trying to get parser.info data at the end of processing to see if it was successful, but the solution for that can work to get the row data as well, if you need.

The key was to wrap all the stream event listeners into a Promise, and within the parser's callback resolve the Promise.

function startFileImport(myFile) {

  // THIS IS THE WRAPPER YOU NEED
  return new Promise((resolve, reject) => {

    let readStream = fs.createReadStream(myFile);

    let fileRows = [];
    const parser = parse({
      delimiter: ','
    });

    // Use the readable stream api
    parser.on('readable', function () {
      let record
      while (record = parser.read()) {
        if (record) { fileRows.push(record); }
      }
    });

    // Catch any error
    parser.on('error', function (err) {
      console.error(err.message)
    });

    parser.on('end', function () {
      const { lines } = parser.info;
      // RESOLVE OUTPUT THAT YOU WANT AT PARENT-LEVEL
      resolve({ status: 'Successfully processed lines: ', lines });
    });

    // This will wait until we know the readable stream is actually valid before piping                
    readStream.on('open', function () {
      // This just pipes the read stream to the response object (which goes to the client)
      readStream.pipe(parser);
    });

    // This catches any errors that happen while creating the readable stream (usually invalid names)
    readStream.on('error', function (err) {
      resolve({ status: null, error: 'readStream error' + err });
    });

  });
}

This is a question that suggests confusion about an asynchronous streaming API and seems to ask at least three things.

  1. How do I get output to contain an array-of-arrays representing the parsed CSV data?

That output will never exist at the top-level, like you (and many other programmers) hope it would, because of how asynchronous APIs operate. All the data assembled neatly in one place can only exist in a callback function. The next best thing syntactically is const output = await somePromiseOfOutput() but that can only occur in an async function and only if we switch from streams to promises. That's all possible, and I mention it so you can check it out later on your own. I'll assume you want to stick with streams.

An array consisting of all the rows can only exist after reading the entire stream. That's why all the rows are only available in the author's "Stream API" example only in the .on('end', ...) callback. If you want to do anything with all the rows present at the same time, you'll need to do it in the end callback.

From https://csv.js.org/parse/api/ note that the author:

  1. uses the on readable callback to push single records into a previously empty array defined externally named output .
  2. uses the on error callback to report errors
  3. uses the on end callback to compare all the accumulated records in output to the expected result

... const output = [] ... parser.on('readable', function(){ let record while (record = parser.read()) { output.push(record) } }) // Catch any error parser.on('error', function(err){ console.error(err.message) }) // When we are done, test that the parsed output matched what expected parser.on('end', function(){ assert.deepEqual( output, [ [ 'root','x','0','0','root','/root','/bin/bash' ], [ 'someone','x','1022','1022','','/home/someone','/bin/bash' ] ] ) })

  1. As to the goal on interfacing with sqlite, this is essentially building a customized streaming endpoint.

In this use case, implement a customized writable stream that accepts the output of parser and sends rows to the database.

Then you simply chain pipe calls as

fs.createReadStream(__dirname+'/fs_read.csv') .pipe(parser) .pipe(your_writable_stream)

Beware : This code returns immediately. It does not wait for the operations to finish. It interacts with a hidden event loop internal to node.js. The event loop often confuses new developers who are arriving from another language, used to a more imperative style, and skipped this part of their node.js training.

Implementing such a customized writable stream can get complicated and is left as an exercise for the reader. It will be easiest if the parser emits a row, and then the writer can be written to handle single rows. Make sure you are able to notice errors somehow and throw appropriate exceptions, or you'll be cursed with incomplete results and no warning or reason why.

A hackish way to do it would have been to replace console.log(data) in let parser = ... with a customized function writeRowToSqlite(data) that you'll have to write anyway to implement a custom stream. Because of asynchronous API issues, using return data there does not do anything useful. It certainly, as you saw, fails to put the data into the output variable.


  1. As to why output in your modified posting does not contain the data...

Unfortunately, as you discovered, this is usually wrong-headed:

const output = fs.createReadStream(__dirname + "/users.csv").pipe(parser); console.log(output);

Here, the variable output will be a ReadableStream , which is not the same as the data contained in the readable stream. Put simply, it's like when you have a file in your filesystem, and you can obtain all kinds of system information about the file, but the content contained in the file is accessed through a different call.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM