简体   繁体   中英

Rupee symbol(₹) in pdf is missing after upload to S3

I have a pdf which has rupee symbol(₹) in it. I am using aws-sdk with nodejs to upload the pdf to s3. Rupee symbol is missing after uploading to s3.

In local, while I upload, it is working fine. Where is eks, rupee symbol is missing in the pdf. Same behaviour is happening while i upload a file using apigateway to s3

Thank you

 const content = fs.readFileSync(filePath); const uploadToS3UsingSdk = async (bucket, key, content) => { return new Promise((resolve, reject) => { const awsConfig = { accessKeyId: process.env.accessKeyId, secretAccessKey: process.env.secretAccessKey, region: process.env.region, apiVersion: "2006-03-01", }; const s3 = new AWS.S3(awsConfig); const uploadParams = { Bucket: bucket, Key: key, Body: content, ContentType: "application/pdf;charset=utf-8", }; s3.upload(uploadParams, function (err, data) { if (err) { console.log("Error", err); return reject({ isSuccess: false, errorMessage: err.errorMessage, status: 500, }); } if (data) { console.log("Upload Success", data.Location); return resolve({ isSuccess: true, errorMessage: null, }); } }); }); }; <:-- begin snippet: js hide: false console: true babel: false -->

PDF is not a wysiwyg (what you see is what you get) format. Internally, it contains rendering instructions that tell a viewer (such as adobe reader) how to build the page.

Your document might contain something like:

  1. Go to 80, 700
  2. Set the active font to F1, font size 12
  3. Set the drawing color to 0,0,0 in RGB mode
  4. Render the glyph at index 251 of the active font

A PDF will also contain a so called resource dictionary, which clarifies which font F1 is.

This is where it might go wrong.

Standard 14 Fonts

The PDF specification (ISO32000) defines a handful of fonts as special (standard type 1 fonts). These fonts should always be present in the reader.

They include:

  • Helvetica
  • Helvetica Bold
  • Helvetica Bold Italic
  • Helvetica Italic
  • ZapfDingBats (symbols)
  • etc

When a piece of software builds a PDF it has 2 options:

  1. Use one of the standard fonts
  2. Insert the font in the PDF

If option 1 is selected, you are bound to those characters that are defined in the standard fonts. Not every font contains every character (for instance, none of the standard 14 contains chines characters)

If option 2 is selected, the font-file is embedded either in its entirety or partially in the PDF.

Subsetting

Partially embedded fonts are called subset fonts. This is a feature typically used when the font is large (contains a lot of characters) but the PDF doesn't use all those characters.

To put it simply, if the PDF only contains the text "Hello World", then there is no point in adding information on how to render the character "A".

Conclusion

These are possible things that might be wrong with your PDF:

  1. You are using a standard font, which does not support the rupee symbol
  2. You are using a custom font, that is not embedded. The reader will substitute this missing font (and the substitute font is typically a standard 14 font)
  3. You are using a custom font, that is broken (not all PDF libraries do a good job of adhering to the standard). When a PDF is broken, a reader might decide to attempt to fix it. Fixes might include font-substitutions.
  4. You are using a custom font, that does not have the rupee symbol.
  5. You are using a custom (subset) font, whose subset does not include the rupee symbol.

There is an online tool to validate PDF documents. It's called VeraPDF. You can find it here .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM