Rupee symbol(₹) in pdf is missing after upload to S3

Question

I have a pdf which has rupee symbol(₹) in it. I am using aws-sdk with nodejs to upload the pdf to s3. Rupee symbol is missing after uploading to s3.

In local, while I upload, it is working fine. Where is eks, rupee symbol is missing in the pdf. Same behaviour is happening while i upload a file using apigateway to s3

Thank you

 const content = fs.readFileSync(filePath); const uploadToS3UsingSdk = async (bucket, key, content) => { return new Promise((resolve, reject) => { const awsConfig = { accessKeyId: process.env.accessKeyId, secretAccessKey: process.env.secretAccessKey, region: process.env.region, apiVersion: "2006-03-01", }; const s3 = new AWS.S3(awsConfig); const uploadParams = { Bucket: bucket, Key: key, Body: content, ContentType: "application/pdf;charset=utf-8", }; s3.upload(uploadParams, function (err, data) { if (err) { console.log("Error", err); return reject({ isSuccess: false, errorMessage: err.errorMessage, status: 500, }); } if (data) { console.log("Upload Success", data.Location); return resolve({ isSuccess: true, errorMessage: null, }); } }); }); }; <:-- begin snippet: js hide: false console: true babel: false -->

Answer 1

PDF is not a wysiwyg (what you see is what you get) format. Internally, it contains rendering instructions that tell a viewer (such as adobe reader) how to build the page.

Your document might contain something like:

Go to 80, 700
Set the active font to F1, font size 12
Set the drawing color to 0,0,0 in RGB mode
Render the glyph at index 251 of the active font

A PDF will also contain a so called resource dictionary, which clarifies which font F1 is.

This is where it might go wrong.

Standard 14 Fonts

The PDF specification (ISO32000) defines a handful of fonts as special (standard type 1 fonts). These fonts should always be present in the reader.

They include:

Helvetica
Helvetica Bold
Helvetica Bold Italic
Helvetica Italic
ZapfDingBats (symbols)
etc

When a piece of software builds a PDF it has 2 options:

Use one of the standard fonts
Insert the font in the PDF

If option 1 is selected, you are bound to those characters that are defined in the standard fonts. Not every font contains every character (for instance, none of the standard 14 contains chines characters)

If option 2 is selected, the font-file is embedded either in its entirety or partially in the PDF.

Subsetting

Partially embedded fonts are called subset fonts. This is a feature typically used when the font is large (contains a lot of characters) but the PDF doesn't use all those characters.

To put it simply, if the PDF only contains the text "Hello World", then there is no point in adding information on how to render the character "A".

Conclusion

These are possible things that might be wrong with your PDF:

You are using a standard font, which does not support the rupee symbol
You are using a custom font, that is not embedded. The reader will substitute this missing font (and the substitute font is typically a standard 14 font)
You are using a custom font, that is broken (not all PDF libraries do a good job of adhering to the standard). When a PDF is broken, a reader might decide to attempt to fix it. Fixes might include font-substitutions.
You are using a custom font, that does not have the rupee symbol.
You are using a custom (subset) font, whose subset does not include the rupee symbol.

There is an online tool to validate PDF documents. It's called VeraPDF. You can find it here .

Rupee symbol(₹) in pdf is missing after upload to S3

Question

1 answers

solution1
0 2022-11-29 09:05:21

Standard 14 Fonts

Subsetting

Conclusion

Rupee symbol(₹) in pdf is missing after upload to S3

Question

1 answers

solution1 0 2022-11-29 09:05:21

Standard 14 Fonts

Subsetting

Conclusion

solution1
0 2022-11-29 09:05:21