NodeJS: Detect last new line byte from Readable Stream when reading by Range to parse large CSV file

Question

Description

I have a very large CSV file (around 1 GB) which I want to process in byte chunks of around 10 MB each. For this purpose, I am creating a Readable Stream with byte-range option fs.createReadStream(sampleCSVfile, { start: 0, end: 10000000 })

Problem

Using the above approach, the stream read from the CSV file contains data for the last line which is not entirely complete. I want a way to identify the byte index at which last line break occurred and start my next Readable Stream with that byte index.

Example CSV: (ignore header row)

John,New York,52
Stacy,Chicago,19
Lisa,Indianapolis,40

Sample Operation:

fs.createReadStream(sampleCSVfile, { start: 0, end: 99 })

Data Returned: (trimmed to above-specified byte-range)

John,New York,52
Stacy,Chicago,19
Lisa,I

Required or Expected:

John,New York,52
Stacy,Chicago,19

So, suppose from the stream fetched the last new line ended at byte-index 78, then my next recursive operation will be: fs.createReadStream(sampleCSVfile, { start: 79, end: 178 })

Answer 1

Below is basic code

const fs = require('fs');

let stream =fs.createReadStream('test.csv', { start:0, end:40})

stream.on('data', (data) => {                       
   console.log(data.length);  //
   let a = data.toString()
   console.log(a);
   let i = a.lastIndexOf('\n');
   console.log(i);
   let substr= a.substring(0, i);
   console.log(substr);
   let byteLength= Buffer.byteLength(substr);
   console.log(byteLength);
 });

DEMO : https://repl.it/@sandeepp2016/SpiritedRowdyObject

But there are already a CSV parser like fast-csv or you can use readLine module will allow you to read steam of data line by line more efficiently

NodeJS: Detect last new line byte from Readable Stream when reading by Range to parse large CSV file

Question

1 answers

solution1
0 2020-03-25 10:01:18

NodeJS: Detect last new line byte from Readable Stream when reading by Range to parse large CSV file

Question

1 answers

solution1 0 2020-03-25 10:01:18

solution1
0 2020-03-25 10:01:18