简体   繁体   中英

PHP Get height and width in Pdf file proprieties

I have a PDF file. I would to get it height and width in mm.

So I do an exec(pdfinfo... ); I have this result:

Creator: Adobe InDesign CS5 (7.0.3) Producer: Acrobat Distiller 9.4.2 (Macintosh) CreationDate: Mon Jan 30 15:48:43 2012 ModDate: Fri Feb 10 10:35:05 2012 Tagged: no Pages: 34 Encrypted: no Page size: 552.744 x 708.643 pts File size: 80724791 bytes Optimized: yes PDF version: 1.3

I have a script witch extract my info:

<?php 
$output = shell_exec("pdfinfo ".$pdflivrelink);
$data = explode("\n", $output); //puts it into an array
for($c=0; $c < count($data); $c++) {
        if(stristr($data[$c],"Pages") == true) {
        $pagesnumber = trim(substr($data[$c],6));
        }
        if(stristr($data[$c],"Page size") == true) {
            $pagesize_H = height_pdf(trim(substr($data[$c],9)));
        }
        if(stristr($data[$c],"Page size") == true) {
            $pagesize_L = width_pdf(trim(substr($data[$c],9)));
        }

}
function height_pdf($size){
$hauteur = round(substr($size,7,7)/2.83);
return $hauteur;
}
function width_pdf($size){
$largeur = round(substr($size,17,7)/2.83);
return $largeur;
} ?>

It's OK, because I have three numbers dot three numbers (552.744 x 708.643). But, I don't know why, some PDF files have this info:

Creator: pdftk 1.41 - www.pdftk.com Producer: iText 2.1.5 (by lowagie.com) CreationDate: Mon Feb 27 13:18:23 2012 ModDate: Mon Feb 27 16:26:12 2012 Tagged: no Pages: 36 Encrypted: no Page size: 425.2 x 538.582 pts File size: 5097597 bytes Optimized: yes PDF version: 1.6

425.2 x 538.582: So my script doesn't work!

Can you help me? thank a lot!


I test this:

    $output = shell_exec("pdfinfo ".$pdflivrelink);
    $data = explode("\n", $output); //puts it into an array
    for($c=0; $c < count($data); $c++) {
            if(stristr($data[$c],"Pages") == true) {
            $pagesnumber = trim(substr($data[$c],6));

            }
            if(stristr($data[$c],"Page size") == true) {
                echo $data[$c];
    preg_match('/Page size: ([0-9]*\.?[0-9]?) x ([0-9]*\.?[0-9]?)/', $data[$c], $matchess);
    $width = round($matchess[1]/2.83);
    $height = round($matchess[2]/2.83);

            }
}
echo "width = $width<br>height = $height";

it result:

Page size: 425.2 x 538.582 ptswidth = 0 height = 0

A little regex will get you the correct results.

<?php
$str = 'Creator: pdftk 1.41 - www.pdftk.com Producer: iText 2.1.5 (by lowagie.com) CreationDate: Mon Feb 27 13:18:23 2012 ModDate: Mon Feb 27 16:26:12 2012 Tagged: no Pages: 36 Encrypted: no Page size: 425.2 x 538.582 pts File size: 5097597 bytes Optimized: yes PDF version: 1.6';

preg_match('/Page size: ([0-9]*\.?[0-9]?) x ([0-9]*\.?[0-9]?)/', $str, $matches);
$width = round($matches[1]/2.83);
$height = round($matches[2]/2.83);

echo "width = $width<br>height = $height";
?>

Update ( asked for more details ) : Complete working example below. I've updated Regex to match real output from pdfinfo

<?php

$output = shell_exec("pdfinfo ".$pdflivrelink);

// find page count
preg_match('/Pages:\s+([0-9]+)/', $output, $pagecountmatches);
$pagecount = $pagecountmatches[1];

// find page sizes
preg_match('/Page size:\s+([0-9]{0,5}\.?[0-9]{0,3}) x ([0-9]{0,5}\.?[0-9]{0,3})/', $output, $pagesizematches);
$width = round($pagesizematches[1]/2.83);
$height = round($pagesizematches[2]/2.83);

echo "pagecount = $pagecount <br>width = $width<br>height = $height";

?>

Why not use plain PHP to get the pdf dimensions?

<?php
function get_pdf_dimensions($path, $box="MediaBox") {
    //$box can be set to BleedBox, CropBox or MediaBox 

    $stream = new SplFileObject($path); 

    $result = false;

    while (!$stream->eof()) {
        if (preg_match("/".$box."\[[0-9]{1,}.[0-9]{1,} [0-9]{1,}.[0-9]{1,} ([0-9]{1,}.[0-9]{1,}) ([0-9]{1,}.[0-9]{1,})\]/", $stream->fgets(), $matches)) {
            $result["width"] = $matches[1];
            $result["height"] = $matches[2]; 
            break;
        }
    }

    $stream = null;

    return $result;
}

var_dump(get_pdf_dimensions("file.pdf"));

Do it with a preg_match() :

// Debugging:
$output = shell_exec("pdfinfo ".$pdflivrelink);
var_dump($output);

// Dimension:
preg_match('~ Page size: ([0-9\.]+) x ([0-9\.]+) pts ~', $output, $matches);
var_dump($matches);


// No of pages:
preg_match('~ Pages ([0-9]+) ~', $output, $matches);
var_dump($matches);

Using Fpdi, noting the use of getTemplateSize it's...

const INCHESTOMM = 25.4;

public static function getPDFdimensions($strFilename): array
{
    $pdf1 = new FPDI('P', 'in');
    $pdf1->setSourceFile($strFilename);
    $tplIdx1 = $pdf1->importPage(1);
    $size = $pdf1->getTemplateSize($tplIdx1);
    $w = $size["width"];
    $h = $size["height"];
    return [round($w * self::INCHESTOMM), round($h * self::INCHESTOMM)];
}

Imagick library can be used to get the dimensions of file

 $image = new Imagick($file);
 $geo=$image->getImageGeometry();
 $width=$geo['width'];
 $height=$geo['height'];

If imagick library is not installed, Ubuntu users can use the following command to install it:

 sudo apt-get install php-imagick
 php -m | grep imagick
 sudo service apache2 restart

Since you know the format of the size string, you can also do it like below. (This function returns width and height in an array.)

function size_pdf($size){
    $result = array();
    $tmp = exlode('x', $size);
    $result['height'] = round(trim($tmp[0])/2.83);
    $result['width'] = round(trim($tmp[1])/2.83);

    return $result;
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM