简体   繁体   中英

Not able to get right Sheets from huge xlsx files — using NodeJS XLSX library

I'm trying to get the data from a huge file (800k rows) and put it into database via lambda (AWS). To do that I'm getting the xlsx file from S3 as a buffer and reading it.

module.exports.getSalesData = new Promise((resolve, reject) => {
  getFileFromS3(filename)
    .then(function (workbook) {
      console.log(workbook.SheetNames[1]); // 'sales'
      console.log(workbook.SheetNames); // showing sales as [ 'main', 'sales', 'Sheet1' ]
      console.log(Array.isArray(workbook.SheetNames)); // true
      console.log(typeof workbook.SheetNames); // Object
      console.log(Object.keys(workbook.Sheets)); // [ 'main', 'Sheet1' ] == why 'sales' is not here?

      var sheet_name = workbook.SheetNames[1]; // sales tab
      var json_sheet = XLSX.utils.sheet_to_json(workbook.Sheets[sheet_name], { raw: true })
      resolve(json_sheet)
    })
    .catch(err => {
      console.log('File: ' + filename + ' doesn\'t exists on S3 or you\'re not connected to the internet.');
    })
})

The issue is that for workbook.Sheets I should see [ 'main', 'sales', 'Sheet1' ] , right?

Then I try to get the number of rows (already converted to JSON) like this:

getSalesData.then(function (data) {
    console.log(data.length + ' rows');
    console.log(data[0]);
  }).catch(err => console.error(err));

Where the parameter data is the json_sheet defined in the function above. So for data.length (number of rows) I get 0 instead of 800k+. And, of course, I'm unable to get data[0] which is undefined .

PS.: the file has 57.3mb -- Not sure if it's the cause.

Thanks in advance for help.

So basically what was happening is that NodeJS wasn't able to read the full file because it crashes the NodeJS VM memory limit for strings.

So what I had to do is to increase the memory limit like this:

node --max-old-space-size=2048 services/process/process-sales.js

Which will increase from 512MB to 2048MB / 2GB of memory for NodeJS.

But this is just a solution to read large amount of values.

I don't recommend using NodeJS to threat large amount of data like this. Instead go with Python using some library like Pandas which is awesome to do it.

PS.: Just my opinion and experience by dealing with data using nodejs. I don't think nodejs was made for it.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM