简体   繁体   English

尝试使用带有 Javascript 的 Playwright 下载 .xlsx 文件并上传到 Azure Blob 存储会生成带有错误标头的格式错误的 .xlsx 文件

[英]Attempting to download a .xlsx file and upload to Azure Blob Storage using Playwright with Javascript produces malformed .xlsx file with a bad header

I am trying to download an excel file and then upload it to Azure Blob Storage for use in Azure Data Factory.我正在尝试下载一个 excel 文件,然后将其上传到 Azure Blob 存储以在 Azure 数据工厂中使用。 I have a playwright javascript that worked when the file was a .csv but when I try the same code with an excel file, it will not open in Excel.我有一个剧作家 javascript,当文件是 .csv 时可以使用,但是当我尝试使用 excel 文件使用相同的代码时,它不会在 Excel 中打开。 It says, "We found a problem with some content in 'order_copy.xlsx'. Do you want us to try to recover as much as we can?: After clicking yes, it says, "Excel cannot open the file 'order_copy.xlsx' because the file format or file extension is not valid.它说,“我们发现 'order_copy.xlsx' 中的某些内容有问题。您希望我们尽可能多地恢复吗?:单击是后,它说,“Excel 无法打开文件 'order_copy.xlsx ' 因为文件格式或文件扩展名无效。 Verify that the file has not been corrupted and that the file extension matches the format of the file."验证文件没有损坏,并且文件扩展名与文件格式匹配。”

Any ideas on how to use the createReadStream more effectively to do this and preserve the .xlsx format?关于如何更有效地使用 createReadStream 来执行此操作并保留 .xlsx 格式的任何想法?

I don't think the saveAs method will work since this code is being executed in an Azure Function with no access to a local known path.我认为 saveAs 方法不会起作用,因为此代码是在 Azure 函数中执行的,无法访问本地已知路径。

My first thought was the content type was not right, so I set that, but it still did not work.我的第一个想法是内容类型不对,所以我设置了它,但它仍然不起作用。 I tried a UTF-8 encoder but that also did not work.我尝试了一个 UTF-8 编码器,但也没有用。

//const data = await streamToText(download_csv.createReadStream())             
const download_reader = await download_csv.createReadStream();
let data = '';
for await (const chunk of download_reader) {
    data += chunk; //---I suspect I need to do something different here
}
// const data_utf8 = utf8.encode(data) //unescape( encodeURIComponent(data) );

const AZURE_STORAGE_CONNECTION_STRING = "..." //---Removed string here
// Create the BlobServiceClient object which will be used to create a container client
const blob_service_client = BlobServiceClient.fromConnectionString(AZURE_STORAGE_CONNECTION_STRING); 
// Get a reference to a container
const container_client = blob_service_client.getContainerClient('order'); 
const blob_name = 'order_copy.xlsx';
// Get a block blob client
const block_blob_client = container_client.getBlockBlobClient(blob_name);        
const contentType = 'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet'
const blobOptions = { blobHTTPHeaders: { blobContentType: contentType } };        
//const uploadBlobResponse = await block_blob_client.upload(data_utf8, data_utf8.length, blobOptions);
const uploadBlobResponse = await block_blob_client.upload(data, data.length, blobOptions);
console.log("Blob was uploaded successfully. requestId: ", uploadBlobResponse.requestId);

Any guidance would be appreciated.任何指导将不胜感激。 Thank you in advance for your help!预先感谢您的帮助!

-Chad -乍得

Thanks @Gaurav for the suggestion on not setting the data to a string.感谢@Gaurav 关于不将数据设置为字符串的建议。 The following code worked after I changed to using a array of the chunks and concatenated it using the Buffer similar to your suggested code.在我更改为使用块数组并使用类似于您建议的代码的缓冲区将其连接后,以下代码起作用。

let chunks = []
for await (const chunk of download_reader) {
    chunks.push(chunk)
}
const fileBuffer = Buffer.concat(chunks)
...
const uploadBlobResponse = await block_blob_client.upload(fileBuffer, fileBuffer.length, blobOptions);

Thanks everyone!感谢大家!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM