简体   繁体   中英

Read a PDF file and get its dimensions to validate its size PHP

I am adding a new feature to an existing web application which will validate an uploaded PDF file's size to ensure it not less than A4. The web application is built using PHP/Laravel.

I have considered two approaches to solving this:

  1. Use GhostScript via php exec to read the uploaded file and get its dimensions - I cannot get this approach working yet
  2. Use a PHP PDF library to read uploaded file and get its dimensions (such as fdpi/fpdf) - I have something working (I think!)

As for Ghostscript, I found this answer here on SO suggesting to use an additional script called pdf_info.ps (I did download this first as the comments suggested). However, I couldn't get it to work correctly. I tried running the following commands before adding it to any PHP script:

λ .\gswin64c -dNODISPLAY -q -sFile=c:\test.pdf [-dDumpMediaSizes=false] [-dDumpFontsNeeded=false] [-dDumpXML] [-dDumpFontsUsed [-dShowEmbeddedFonts] ] ..\toolbin\pdf_info.ps
Error: /undefinedfilename in ([-dDumpMediaSizes=false])
Operand stack:

Execution stack:
   %interp_exit   .runexec2   --nostringval--   --nostringval--   --nostringval--   2   %stopped_push   --nostringval--   --nostringval--   --nostringval--   false   1   %stopped_push
Dictionary stack:
   --dict:1196/1684(ro)(G)--   --dict:0/20(G)--   --dict:78/200(L)--
Current allocation mode is local
Last OS error: No such file or directory
GPL Ghostscript 9.19: Unrecoverable error, exit code 1

I seem to receive different variations of the error: "Error: /undefinedfilename in ([-dDumpMediaSizes=false])" when I try different approaches such as adding full file paths. I am on windows so I have tried adding full file paths like this "C:/Program Files/gs/gs9.19/toolbin/pdf_info.ps" and get the same error.

With FPDF/FDPI, I set up a small project using composer and pulled in this package https://github.com/Setasign/FPDI-FPDF . I am currently using the following code to read an existing file:

<?php
use setasign\Fpdi;

// setup the autoload function
require_once('vendor/autoload.php');

// initiate FPDI
$pdf = new Fpdi\Fpdi();

// add a page
$pdf->AddPage();

// set the source file
$pdf->setSourceFile("test.pdf");

// import page 1
$tplId = $pdf->importPage(1);

// use the imported page and place it at point 10,10 with a width of 100 mm
$pdf->useTemplate($tplId, 10, 10, 100);

// output page dimensions
echo $pdf->GetPageWidth(); 
echo '<br>';
echo $pdf->GetPageHeight();

and I get the following output

210.00155555556

297.00008333333

So I want to ask the following questions:

Ghostscript approach questions

  1. How can I get it working?
  2. Is this approach going to have considerable performance gains compared to using FPDF/FPDI?

FPDF/FPDI approach questions

  1. Regarding the code, is this the correct way to read an existing file and check its dimensions, or am I essentially adding it to an A4 sized page with the useTemplate() method?
  2. What measurement are the values I am echoing out in (I think its pt) and could I use these values (ie 210,297) to validate a page is A4?
  3. Are there any other considerations I should keep in mind using this approach? Such as files may be a few pts or pixels off A4?

I'd welcome suggestions for any alternative approaches.

Any help is much appreciated, thank you!

The size of an imported page is returned eg by the getTemplateSize() method of FPDI:

$pdf = new FPDI('P','mm'); // change the snd parameter to change the units
$pdf->setSourceFile('test.pdf');
$pageId = $pdf->importPage(1);
$size = $pdf->getTemplateSize($pageId);

$size will be an array with following keys: width, height, 0 (=width), 1 (=height) and orientation (L or P).

The [ and ] characters in the documentation are intended to indicate this is optional. If you want to use them then do it like this:

gswin64c -dNODISPLAY -q -sFile=c:\test.pdf -dDumpMediaSizes=false -dDumpFontsNeeded=false -dDumpXML -dDumpFontsUsed -dShowEmbeddedFonts ..\toolbin\pdf_info.ps

The units for PDF files are in points, 1/72 inch. Files need not be A4 at all. You should also look at the CropBox and potentially ArtBox and BleedBox as well as the MediaBox values.

Note that in this case (I think) the output will go to stdout, you may want to redirect it to a file.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM