如何将 PDF 从 base64 字符串转换为文件？

Question

I have a PDF as a base64 string and I need to write it to file using Python.我有一个 PDF 作为 base64 字符串，我需要使用 Python 将其写入文件。 I tried this:我试过这个：

import base64

base64String = "data:application/pdf;base64,JVBERi0xLjQKJeHp69MKMSAwIG9iago8PC9Qcm9kdWNlciAoU2tpYS9..."

with open('temp.pdf', 'wb') as theFile:
  theFile.write(base64.b64decode(base64String))

But it didn't create a valid PDF file.但它没有创建有效的 PDF 文件。 What am I missing?我错过了什么？

Answer 1

From my understanding base64decode only takes in a base64 string and looks like you have some headers on your string that are not encoded.根据我的理解，base64decode 只接受一个 base64 字符串，看起来您的字符串中有一些未编码的标头。

I would remove "data:application/pdf;base64,"我会删除“数据：应用程序/pdf；base64，”

check out the doc here: https://docs.python.org/2/library/base64.html在此处查看文档： https : //docs.python.org/2/library/base64.html

When I've used it in the past, I have only used the encoded string.过去用的时候，只用了编码后的字符串。

Answer 2

Does writing it by using the codecs.decode function work?使用codecs.decode函数编写它是否有效？ also as Mark stated, you can try to remove the data:application/pdf;base64, portion of the string as this section of the string is not to be decoded.:也正如马克所说，您可以尝试删除data:application/pdf;base64,字符串的一部分，因为字符串的这一部分不会被解码。：

import codecs
base64String = "JVBERi0xLjQKJeHp69MKMSAwIG9iago8PC9Qcm9kdWNlciAoU2tpYS9..."


with open("test.pdf", "wb") as f:
    f.write(codecs.decode(base64string, "base64"))

Answer 3

This is not just base64 encoded data, but data-uri encoded:这不仅仅是 base64 编码的数据，而是 data-uri 编码的：

https://developer.mozilla.org/en-US/docs/Web/HTTP/Basics_of_HTTP/Data_URIs https://developer.mozilla.org/en-US/docs/Web/HTTP/Basics_of_HTTP/Data_URIs

There is another post on stack overflow asking how to parse such strings in Python:还有另一篇关于堆栈溢出的帖子，询问如何在 Python 中解析此类字符串：

How to parse data-uri in python? 如何在python中解析数据uri？

The gist of it is to remove the header (everything up to and including the first comma):它的要点是删除标题（包括第一个逗号在内的所有内容）：

theFile.write(base64.b64decode(base64String.split(",")[1:2]))

NOTE: I use [1:2] instead of [1] because it won't throw an exception if there is only 1 element in the list because nothing follows the comma (empty data).注意：我使用 [1:2] 而不是 [1]，因为如果列表中只有 1 个元素，它不会抛出异常，因为逗号后面没有任何内容（空数据）。

Answer 4

Extending @Jebby 's answer using Base64 (had the same issue as @SmartManoj )使用 Base64扩展@Jebby的答案（与@SmartManoj有相同的问题）

import base64
base64String = "JVBERi0xLjQKJeHp69MKMSAwIG9iago8PC9Qcm9kdWNlciAoU2tpYS9..."


with open("test.pdf", "wb") as f:
    f.write(base64.b64decode(base64string))

如何将 PDF 从 base64 字符串转换为文件？

问题描述

4 个解决方案

解决方案1
9 已采纳 2018-01-04 22:00:45

解决方案2
7 2018-01-04 22:01:04

解决方案3
2 2018-01-04 22:16:11

解决方案4
1 2021-05-17 01:03:26

如何将 PDF 从 base64 字符串转换为文件？

问题描述

4 个解决方案

解决方案1 9 已采纳 2018-01-04 22:00:45

解决方案2 7 2018-01-04 22:01:04

解决方案3 2 2018-01-04 22:16:11

解决方案4 1 2021-05-17 01:03:26

解决方案1
9 已采纳 2018-01-04 22:00:45

解决方案2
7 2018-01-04 22:01:04

解决方案3
2 2018-01-04 22:16:11

解决方案4
1 2021-05-17 01:03:26