简体   繁体   中英

Check if base64 string contains a valid PDF - and nothing else

In my web application, users may only upload images and PDFs. These come as base64 strings from the front to the backend. There, on the Node.js 8.9 server, I want to do some sanity checking, ie test whether the base64 strings I get are actually just images and PDFs - and nothing else.

For images, that was easy. Using the sharp npm-module with failOnError true, gave me exactly what I wanted: One wrong char in the base64 string would cause a failure and the input would be rejected.

However, for PDFs I cannot find a similar solution. I've tried pdf2json (which seemed overpowered for my requirement anyway), but failed at passing base64 strings via converting to a buffer.

I finally found an NPM module that does exactly what I expect: hummusJS. The code below works as far as my tests go: Valid PDFs are accepted, while invalid strings are rejected. Didn't notice any performance impacts so far.

var hummus = require('hummus');

let pdfBase64String = '<<base64 string here>>';
let bufferPdf;
try {
  bufferPdf = Buffer.from(pdfBase64String, 'base64');
  const pdfReader = hummus.createReader(new hummus.PDFRStreamForBuffer(bufferPdf));
  var pages = pdfReader.getPagesCount();
  if(pages > 0) {
      console.log("Parsable with Hummus and more than 0 pages. Seems to be a valid PDF!");
  }
  else {
      console.log("Unexpected outcome for number o pages: '" + pages + "'");
  }
}
catch(err) {
   console.log("ERROR while handling buffer of pdfBase64 and/or trying to parse PDF: " + err);
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM