[英]Base64 Decode embedded PDF in Typescript
Within an XML file we have a base64 encoded String representing a PDF file, that contains some table representations, ie similar to this example .在一个 XML 文件中,我们有一个代表 PDF 文件的 base64 编码字符串,其中包含一些表格表示,即类似于这个例子。 When decoding the base64 string of that PDF document (iesuch as this ), we end up with a PDF document of 66 kB in size, which can be opened in any PDF viewer correctly.
在解码该 PDF 文档的 base64 字符串时(例如 this ),我们最终得到一个大小为 66 kB 的 PDF 文档,该文档可以在任何 PDF 查看器中正确打开。
On trying to decode that same base64 encoded string with Buffer
in TypeScript (within a VSCode extension), ie with the functions below:在尝试使用 TypeScript 中的
Buffer
(在 VSCode 扩展中)解码相同的 base64 编码字符串时,即使用以下函数:
function decodeBase64(base64String: string): string {
const buf: Buffer = Buffer.from(base64String, "base64");
return buf.toString();
}
// the base64 encoded string is usually extracted from an XML file directly
// for testing purposes we load that base64 encoded string from a local file
const base64Enc: string = fs.readFileSync(".../base64Enc.txt", "ascii");
const base64Decoded: string = decodeBase64(base64Enc);
fs.writeFileSync(".../table.pdf", base64Decoded);
we end up with a PDF of 109 kB in size and a document that can't be opened using PDF viewers.我们最终得到一个 109 kB 大小的 PDF 和一个无法使用 PDF 查看器打开的文档。
For a simple PDF, such as this one , with a base64 encoded string representation likethis , the code above works and the PDF can be read in any PDF viewer.对于像这样的简单 PDF,具有像这样的 base64 编码字符串表示,上面的代码有效,并且可以在任何 PDF 查看器中阅读 PDF。
I've also tried to directly read in the locally stored base64 encoded representation of the PDF file using我还尝试使用以下命令直接读取本地存储的 PDF 文件的 base64 编码表示
const buffer: string | Buffer = fs.readFileSync(".../base64Enc.txt", "base64");
though isn't producing something useful either.虽然也没有产生有用的东西。
Even with a slight adaptation of this suggestion , due to atob(...)
not being present (with suggestions to replace atob
with Buffer), which ended up in a code like this:即使稍微修改了这个建议,由于
atob(...)
不存在( 建议用 Buffer 替换atob
),结果是这样的代码:
const buffer: string = fs.readFileSync(".../base64Enc.txt", "ascii");
// atob(...) is not present, other answers suggest to use Buffer for conversion
const binary: string = Buffer.from(buffer, 'base64').toString();
const arrayBuffer: ArrayBuffer = new ArrayBuffer(binary.length);
const uintArray: Uint8Array = new Uint8Array(arrayBuffer);
for (let i: number = 0; i < binary.length; i++) {
uintArray[i] = binary.charCodeAt(i);
}
const decoded: string = Buffer.from(uintArray.buffer).toString();
fs.writeFileSync(".../table.pdf", decoded);
I'm not ending up with a readable PDF.我没有得到一个可读的 PDF。 The "decoded"
table.pdf
sample ends up with 109 kB in size. “解码”的
table.pdf
样本最终大小为 109 kB。
What am I doing wrong here?我在这里做错了什么? How can I decode a PDF such as the table.pdf sample to obtain a readable PDF document, similar to the functionality provided by Notepad++?
我怎样才能解码一个PDF,比如table.pdf 示例来获得一个可读的PDF 文档,类似于Notepad++ 提供的功能?
Borrowing heavily from answers to How to get an array from ArrayBuffer?从如何从 ArrayBuffer 获取数组的答案中大量借用? , if you get a
Uint8Array
right from the Buffer
using the Uint8Array
constructor: , 如果您使用
Uint8Array
构造函数从Buffer
获得一个Uint8Array
:
const buffer: string = fs.readFileSync(".../base64Enc.txt", "ascii");
const uintArray: Uint8Array = new Uint8Array(Buffer.from(buffer, 'base64'));
fs.writeFileSync(".../table.pdf", uintArray);
Writing the Uint8Array
directly to the file guarantees there's no corruption due to encoding changes from moving to and from strings.将
Uint8Array
直接写入文件可确保不会因从字符串移动到字符串或从字符串移动而导致的编码更改而造成损坏。
Just a note: the
Uint8Array
points to the same internal array of bytes as theBuffer
.请注意:
Uint8Array
指向与Buffer
相同的内部字节数组。 Not that it matters in this case, since this code doesn't reference theBuffer
outside of the constructor, but in case someone decides to create a new variable for the output ofBuffer.from(buffer, 'base64')
.在这种情况下并不重要,因为此代码不引用构造函数外部的
Buffer
,但以防有人决定为Buffer.from(buffer, 'base64')
的输出创建一个新变量。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.