简体   繁体   中英

Mozilla PDF - how to view PDFs from url in react app?

I have followed a quick tutorial on how to implement Mozilla's PDF viewer with React. I have made a codesanbox here . I would like to know if this is possible to implement with importing node module of pdfjs. So, instead of downloading the package in to public folder to use it with import:

export default class PDFJs {
  init = (source, element) => {
    const iframe = document.createElement("iframe");

    iframe.src = `/pdfjs-2.5.207-dist/web/viewer.html?file=${source}`;
    iframe.width = "100%";
    iframe.height = "100%";

    element.appendChild(iframe);
  };
}

Also, this kind of setup doesn't work when PDF's source is an URL. If I do that I get an error:

PDF.js v2.5.207 (build: 0974d6052) Message: file origin does not match viewer's

I have commented out the part of the code where it checks the file's origin in pdfjs-2.5.207-dist/web/viewer.js :

  //if (origin !== viewerOrigin && protocol !== "blob:") {
  //  throw new Error("file origin does not match viewer's");
  //} 

But, then I got an error:

PDF.js v2.5.207 (build: 0974d6052) Message: Failed to fetch

How can I fix this? Is it possible to import this package like a module into react component and how can I use it for PDF's from external resources with URL?

Referrer Policy: strict-origin-when-cross-origin / Usage with external sources

The pdf should be located on the same host (including same protocol). Hosting the pdf on the same url as your app/website, should solve this problem.

Allowing a pdf to be loaded in other pages can lead to various security risks.

If you want to show an up-to-date version of an external pdf on your own homepage, there are basically two options.

Hosting PDF on your server

Running a server script (cron) which downloads the pdf and hosts it on your own server.

Allow cross-origin

If you have access to the server hosting the pdf you can send headers to allow cross-origin.

Access-Control-Allow-Origin: *

How to use pdfjs with yarn/npm

Documentation on this is really bad, but they have a repository pdfjs-dist and some related docs.

Installation

npm install pdfjs-dist

Usage (from DOC )

import * as pdfjsLib from 'pdfjs-dist';
var url = 'https://raw.githubusercontent.com/mozilla/pdf.js/ba2edeae/examples/learning/helloworld.pdf';

// The workerSrc property shall be specified.
pdfjsLib.GlobalWorkerOptions.workerSrc = '//mozilla.github.io/pdf.js/build/pdf.worker.js';

// Asynchronous download of PDF
var loadingTask = pdfjsLib.getDocument(url);
loadingTask.promise.then(function(pdf) {
  console.log('PDF loaded');
  
  // Fetch the first page
  var pageNumber = 1;
  pdf.getPage(pageNumber).then(function(page) {
    console.log('Page loaded');
    
    var scale = 1.5;
    var viewport = page.getViewport({scale: scale});

    // Prepare canvas using PDF page dimensions
    var canvas = document.getElementById('the-canvas');
    var context = canvas.getContext('2d');
    canvas.height = viewport.height;
    canvas.width = viewport.width;

    // Render PDF page into canvas context
    var renderContext = {
      canvasContext: context,
      viewport: viewport
    };
    var renderTask = page.render(renderContext);
    renderTask.promise.then(function () {
      console.log('Page rendered');
    });
  });
}, function (reason) {
  // PDF loading error
  console.error(reason);
});

Service Worker

You do need the service worker - pdfjs does not work without it, so neither does reactpdf.

If you use CRA, and do not want to use CDN, you can perform following steps:

1) Copy worker to public folder

cp ./node_modules/pdfjs-dist/build/pdf.worker.js public/scripts

2) Register Service Worker

pdfjsLib.GlobalWorkerOptions.workerSrc = `${process.env.PUBLIC_URL}/scripts/pdf.worker.js`

Here is a working codesandbox with Mozilla's viewer and your pdf.

Things to note :

  1. Your pdf must be served over HTTPS, otherwise you get this error :

Mixed Content: The page at 'https://codesandbox.io/' was loaded over HTTPS, but requested an insecure resource 'http://www.africau.edu/images/default/sample.pdf'. This request has been blocked; the content must be served over HTTPS.

  1. The server hosting the pdf should allow your app domain using Access-Control-Allow-Origin , or be in the same origin, otherwise you get this error :

Access to fetch at 'https://www.adobe.com/support/products/enterprise/knowledgecenter/media/c4611_sample_explain.pdf' from origin 'https://lchyv.csb.app' has been blocked by CORS policy: No 'Access-Control-Allow-Origin' header is present on the requested resource. If an opaque response serves your needs, set the request's mode to 'no-cors' to fetch the resource with CORS disabled.

  1. For the demo purpose, I used https://cors-anywhere.herokuapp.com/<URL_TO_PDF> , which sets Access-Control-Allow-Origin: * for you, but should not be used in production!

So in conclusion, your pdf didn't load because of the browser's restrictions. Importing pdfjs directly in your app, and building a viewer from scratch (which is a lot of work), won't solve those problems.

I make changes to your example so it will accept an URL

My code bellow

import pdfjsWorker from "pdfjs-dist/build/pdf.worker.entry";
const pdfjsLib = import("pdfjs-dist/build/pdf");

export default class PDFJs {
  init = (source, element) => {
    pdfjsLib.then((pdfjs) => {
      pdfjs.GlobalWorkerOptions.workerSrc = pdfjsWorker;
      var loadingTask = pdfjs.getDocument(`${source}`);
      loadingTask.promise.then((pdf) => {
        pdf.getPage(1).then((page) => {
          var scale = 1.5;
          var viewport = page.getViewport({ scale: scale });
          var canvas = document.createElement("canvas");
          var context = canvas.getContext("2d");
          canvas.height = viewport.height;
          canvas.width = viewport.width;
          element.appendChild(canvas);
          var renderContext = {
            canvasContext: context,
            viewport: viewport
          };
          page.render(renderContext);
        });
      });
    });
  };
}

You can see the result here

Note : As others have already said, using just react (or any client side library), it is not possible to fetch an external resource (PDF in your case) without solving the CORS issue. You will need some kind of server-side tech to resolve it. (unless you own / have access to the external resource server)


Looking at the sandbox code you have provided, it seems you are already using node js, but the solution is applicable for all.

Basically, you would request your server to fetch the file for you, and then return the file as a response payload. eg a node server listening to requests on fetchPdf and returns the file itself as response

app.post('/fetchPdf', asyncMiddleware(async (req, res, next) => { const pdfPath = await downloadFile(req.body.url); if (pdfPath) { res.type('application/pdf'); res.sendFile(pdfPath); res.on('finish', function () { try { fs.unlinkSync(pdfPath); } catch (e) { console.error(e); console.log(`Unable to delete file ${pdfPath}`); } }); } else res.status(404).send('Not found'); })); function downloadFile(url) { return new Promise((resolve, reject) => { const absoluteFilePath = path.join(__dirname, `public/${crypto.randomBytes(20).toString('hex')}.pdf`); const file = fs.createWriteStream(absoluteFilePath); console.log(`Requested url ${url}`); const request = http.get(url, function (downloadResponse) { downloadResponse.pipe(file).on('finish', () => { resolve(absoluteFilePath); }); }).on('error', function (err) { fs.unlink(absoluteFilePath); resolve(null); }); }); }

Note: For educational & learning purposes, this will work, but there are various security issues with deploying your code to production this way.

Primarily, your server should be able to make requests to any site on the Internet
Secondarily, without some kind of authentication, your site will become a hotspot for anyone wishing to download an external resource blocked by CORS (Similar to [https://cors-anywhere.herokuapp.com])


As for your second question, yes, it is possible to use the pdfjs library with react & npm.
You can refer to yurydelendik's repo, taken from official pdf.js mozilla repository.
I have also created a fork of the same here demonstrating above said server-side solution.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM