Description
I have a very large CSV file (around 1 GB) which I want to process in byte chunks of around 10 MB each. For this purpose, I am creating a Readable Stream with byte-range option fs.createReadStream(sampleCSVfile, { start: 0, end: 10000000 })
Problem
Using the above approach, the stream read from the CSV file contains data for the last line which is not entirely complete. I want a way to identify the byte index at which last line break occurred and start my next Readable Stream with that byte index.
Example CSV: (ignore header row)
John,New York,52
Stacy,Chicago,19
Lisa,Indianapolis,40
Sample Operation:
fs.createReadStream(sampleCSVfile, { start: 0, end: 99 })
Data Returned: (trimmed to above-specified byte-range)
John,New York,52
Stacy,Chicago,19
Lisa,I
Required or Expected:
John,New York,52
Stacy,Chicago,19
So, suppose from the stream fetched the last new line ended at byte-index 78, then my next recursive operation will be: fs.createReadStream(sampleCSVfile, { start: 79, end: 178 })
Below is basic code
const fs = require('fs');
let stream =fs.createReadStream('test.csv', { start:0, end:40})
stream.on('data', (data) => {
console.log(data.length); //
let a = data.toString()
console.log(a);
let i = a.lastIndexOf('\n');
console.log(i);
let substr= a.substring(0, i);
console.log(substr);
let byteLength= Buffer.byteLength(substr);
console.log(byteLength);
});
DEMO : https://repl.it/@sandeepp2016/SpiritedRowdyObject
But there are already a CSV parser like fast-csv or you can use readLine module will allow you to read steam of data line by line more efficiently
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.