简体   繁体   English

节点:尝试流式传输 Excel 文件并将缓冲区传递给“xlsx”库

[英]Node: trying to stream an Excel file and pass the buffer to 'xlsx' library

I'm using the node.js package 'xlsx' to read an excel file.我正在使用 node.js 包“xlsx”来读取 excel 文件。 Reading a file from the file system synchronously works fine, but streaming is a bit tricky.从文件系统同步读取文件工作正常,但流式传输有点棘手。 ( The file is served remotely and I can only receive it as a stream. ) Local streamed files produce the same problem, so it's easy to reproduce. (该文件是远程提供的,我只能以流的形式接收它。)本地流文件会产生同样的问题,因此很容易重现。

I've followed the examples here: https://www.npmjs.com/package/xlsx but get "end of data reached" or "unsupported file" depending on approach.我已经按照这里的示例进行操作: https : //www.npmjs.com/package/xlsx但根据方法得到“到达数据结束”或“不支持的文件”。

const XLSX = require('xlsx');
const fs = require('fs');

const stream = fs.createReadStream('sample.xlsx');


// This function returns Error: Unsupported file 48

documentedExample = function(){
  var arr = new Array();

  stream.on('data', function( arraybuffer ){
    var data = new Uint8Array(arraybuffer);
    for(var i = 0; i != data.length; ++i) arr[i] = String.fromCharCode(data[i]);
  });
  stream.on('end', function(){
    var bstr = arr.join("");
    var workbook = XLSX.read(bstr, {type:"binary"});

  });
}


// This function returns Error: End of data reached (data length = 75589, asked index = 77632). Corrupted zip ?

alternateExample = function(){
  var bufferArray = [];
  stream.on('data', function( thisChunk ){
    bufferArray.push( thisChunk );
  });
  stream.on('end', function(){
    var excelDataBuffer = bufferArray.join("");
    excelDataBuffer = excelDataBuffer.toString();
    var workbook = XLSX.read(excelDataBuffer, {type:"binary"});
  });
}

What would be the correct way to reassemble an xlsx binary for local use?重新组装 xlsx 二进制文件以供本地使用的正确方法是什么?

Current library do not support streams for reading当前不支持读取流

From their docs :从他们的文档

Note on Streaming Read流式读取注意事项

The most common and interesting formats (XLS, XLSX/M, XLSB, ODS) are ultimately ZIP or CFB containers of files.最常见和最有趣的格式(XLS、XLSX/M、XLSB、ODS)最终是文件的 ZIP 或 CFB 容器。 Neither format puts the directory structure at the beginning of the file: ZIP files place the Central Directory records at the end of the logical file, while CFB files can place the FAT structure anywhere in the file!这两种格式都没有将目录结构放在文件的开头:ZIP 文件将中央目录记录放在逻辑文件的末尾,而 CFB 文件可以将 FAT 结构放在文件的任何位置! As a result, to properly handle these formats, a streaming function would have to buffer the entire file before commencing.因此,为了正确处理这些格式,流函数必须在开始之前缓冲整个文件。 That belies the expectations of streaming, so we do not provide any streaming read API.这与流式传输的期望背道而驰,因此我们不提供任何流式读取 API。 If you really want to stream, there are node modules like concat-stream that will do the buffering for you.如果你真的想流式传输,有像 concat-stream 这样的节点模块可以为你做缓冲。

There are something for writing in streams in xlsx package:有一些东西可以在xlsx包中写入流:

The streaming write functions are available in the XLSX.stream object.流写入函数在 XLSX.stream 对象中可用。 They take the same arguments as the normal write functions but return a readable stream.它们采用与普通写函数相同的参数,但返回可读流。 They are only exposed in node.它们仅在节点中公开。

XLSX.stream.to_csv is the streaming version of XLSX.utils.sheet_to_csv. XLSX.stream.to_csv 是 XLSX.utils.sheet_to_csv 的流媒体版本。

XLSX.stream.to_html is the streaming version of XLSX.utils.sheet_to_html. XLSX.stream.to_html 是 XLSX.utils.sheet_to_html 的流媒体版本。

I have wrote to support asking do Pro version has it and received this answer:我已经写信支持询问Pro版本是否有它并收到了这个答案:

The zip format, used in XLSX, prevents true stream reading (you have to read to the end of the file in order to find out where all of the subfiles are). XLSX 中使用的 zip 格式阻止了真正的流读取(您必须读取到文件的末尾才能找出所有子文件的位置)。 I wouldn't use the extremist language of https://github.com/thejoshwolfe/yauzl#no-streaming-unzip-api but I agree with the core message:我不会使用https://github.com/thejoshwolfe/yauzl#no-streaming-unzip-api的极端主义语言,但我同意核心信息:

Any library that offers a streaming unzip API [is] either dishonest or nonconformant (usually the latter).任何提供流式解压缩 API 的库[都是]不诚实或不符合规范(通常是后者)。

We offer event-based reading, which skips building a full workbook object.我们提供基于事件的阅读,它跳过构建完整的工作簿对象。 You receive row objects and that minimizes memory pressure.您收到行对象并最大限度地减少内存压力。

On the write side, we offer stream-based XLSX writing as well as SpreadsheetML and other formats.在写入方面,我们提供基于流的 XLSX 写入以及 SpreadsheetML 和其他格式。

In my case I had to rewrote everything =(就我而言,我不得不重写所有内容 =(

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 在尝试使用 JavaScript 库加载 xlsx excel 文件时从 (10:10:00) 获取小数 - Getting decimal from (10:10:00) while trying to load an xlsx excel file with a JavaScript library 如何使用xlsx库从txt文件复制文本并粘贴到节点环境中的xlsx文件中 - How to copy text from txt file and paste into xlsx file in node environment using xlsx library 清除节点流内部缓冲区 - Clearing node stream internal buffer SheetJs 使用节点读取 xlsx 文件 - SheetJs reading a xlsx file with node 使用客户端 JavaScript 创建一个带有几种样式的 excel 文件(如果可能,使用 js-xlsx 库) - Create an excel file with a few styles using client side JavaScript (if possible using js-xlsx library) 将 react-export-excel 第三方库中的文件 type.xlsx 更改为.xls? - change the file type .xlsx to .xls in react-export-excel third part library? 如何在 Javascript 中使用 XLSX 库来解析来自特定行的 Excel 文件 - How to use XLSX library in Javascript to parse an Excel file from a particular row 在 Angularjs 和 WebApi 中下载 Excel 文件 xlsx - Downloading Excel file xlsx in Angularjs and WebApi XLSX JS 中的 excel 文件中缺少第一个值 - First value is missing in excel file in XLSX JS 如何使用nodemailer发送Excel(.xlsx)文件? - How to send excel (.xlsx) file with nodemailer?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM