简体   繁体   English

上传到 S3 后,pdf 中的卢比符号 (?) 丢失

[英]Rupee symbol(₹) in pdf is missing after upload to S3

I have a pdf which has rupee symbol(₹) in it.我有一个 pdf,里面有卢比符号 (₹)。 I am using aws-sdk with nodejs to upload the pdf to s3.我正在使用 aws-sdk 和 nodejs 将 pdf 上传到 s3。 Rupee symbol is missing after uploading to s3.上传到 s3 后卢比符号丢失。

In local, while I upload, it is working fine.在本地,当我上传时,它工作正常。 Where is eks, rupee symbol is missing in the pdf. Same behaviour is happening while i upload a file using apigateway to s3 eks 在哪里,pdf 中缺少卢比符号。当我使用 apigateway 将文件上传到 s3 时,发生了相同的行为

Thank you谢谢

 const content = fs.readFileSync(filePath); const uploadToS3UsingSdk = async (bucket, key, content) => { return new Promise((resolve, reject) => { const awsConfig = { accessKeyId: process.env.accessKeyId, secretAccessKey: process.env.secretAccessKey, region: process.env.region, apiVersion: "2006-03-01", }; const s3 = new AWS.S3(awsConfig); const uploadParams = { Bucket: bucket, Key: key, Body: content, ContentType: "application/pdf;charset=utf-8", }; s3.upload(uploadParams, function (err, data) { if (err) { console.log("Error", err); return reject({ isSuccess: false, errorMessage: err.errorMessage, status: 500, }); } if (data) { console.log("Upload Success", data.Location); return resolve({ isSuccess: true, errorMessage: null, }); } }); }); }; <:-- begin snippet: js hide: false console: true babel: false -->

PDF is not a wysiwyg (what you see is what you get) format. PDF 不是所见即所得(所见即所得)格式。 Internally, it contains rendering instructions that tell a viewer (such as adobe reader) how to build the page.在内部,它包含渲染指令,告诉查看者(如 adobe reader)如何构建页面。

Your document might contain something like:您的文档可能包含以下内容:

  1. Go to 80, 700 Go转80、700
  2. Set the active font to F1, font size 12将活动字体设置为 F1,字体大小 12
  3. Set the drawing color to 0,0,0 in RGB mode在 RGB 模式下将绘图颜色设置为 0,0,0
  4. Render the glyph at index 251 of the active font渲染活动字体索引 251 处的字形

A PDF will also contain a so called resource dictionary, which clarifies which font F1 is. PDF 还将包含一个所谓的资源字典,它阐明了F1是哪种字体。

This is where it might go wrong.这可能是go 错误的地方。

Standard 14 Fonts标准 14 Fonts

The PDF specification (ISO32000) defines a handful of fonts as special (standard type 1 fonts). PDF 规范 (ISO32000) 将少数 fonts 定义为特殊(标准 1 型字体)。 These fonts should always be present in the reader.这些 fonts 应该始终存在于阅读器中。

They include:他们包括:

  • Helvetica黑体字
  • Helvetica Bold Helvetica 粗体
  • Helvetica Bold Italic Helvetica 粗斜体
  • Helvetica Italic Helvetica 斜体
  • ZapfDingBats (symbols) ZapfDingBats(符号)
  • etc ETC

When a piece of software builds a PDF it has 2 options:当一个软件构建一个 PDF 时,它有两个选项:

  1. Use one of the standard fonts使用标准之一 fonts
  2. Insert the font in the PDF在PDF中插入字体

If option 1 is selected, you are bound to those characters that are defined in the standard fonts. Not every font contains every character (for instance, none of the standard 14 contains chines characters)如果选择选项 1,您将绑定到标准 fonts 中定义的那些字符。并非每种字体都包含所有字符(例如,标准 14 中没有一个包含中文字符)

If option 2 is selected, the font-file is embedded either in its entirety or partially in the PDF.如果选择选项 2,则字体文件将全部或部分嵌入 PDF。

Subsetting子集化

Partially embedded fonts are called subset fonts. This is a feature typically used when the font is large (contains a lot of characters) but the PDF doesn't use all those characters.部分嵌入的 fonts 称为子集 fonts。这是通常在字体较大(包含很多字符)但 PDF 不使用所有这些字符时使用的功能。

To put it simply, if the PDF only contains the text "Hello World", then there is no point in adding information on how to render the character "A".简单来说,如果 PDF 只包含文本“Hello World”,那么添加有关如何呈现字符“A”的信息是没有意义的。

Conclusion结论

These are possible things that might be wrong with your PDF:以下是您的 PDF 可能存在的问题:

  1. You are using a standard font, which does not support the rupee symbol您使用的是标准字体,不支持卢比符号
  2. You are using a custom font, that is not embedded.您正在使用未嵌入的自定义字体。 The reader will substitute this missing font (and the substitute font is typically a standard 14 font)读者将替换这个缺失的字体(替换字体通常是标准的 14 号字体)
  3. You are using a custom font, that is broken (not all PDF libraries do a good job of adhering to the standard).您使用的是自定义字体,该字体已损坏(并非所有 PDF 库都能很好地遵守标准)。 When a PDF is broken, a reader might decide to attempt to fix it.当 PDF 损坏时,读者可能会决定尝试修复它。 Fixes might include font-substitutions.修复可能包括字体替换。
  4. You are using a custom font, that does not have the rupee symbol.您使用的是自定义字体,没有卢比符号。
  5. You are using a custom (subset) font, whose subset does not include the rupee symbol.您正在使用自定义(子集)字体,其子集不包含卢比符号。

There is an online tool to validate PDF documents.有一个在线工具可以验证 PDF 文档。 It's called VeraPDF.它叫做 VeraPDF。 You can find it here .你可以在这里找到它。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM