简体   繁体   English

如何使用PDF.js确定PDF的自然页面大小

[英]How to determine natural page size of a PDF using PDF.js

I am using pdf.js in a discovery setting to determine the height and width in pixels of a number of PDF documents. 我在发现设置中使用pdf.js来确定许多PDF文档的高度和宽度(以像素为单位)。

In the following code snippet, I am pulling a buffer of an 8.5 x 11 Word document printed to PDF. 在下面的代码片段中,我将拉出一个打印为PDF的8.5 x 11 Word文档的缓冲区。 The return I am receiving is the size divided by 4.16666... . 我收到的回报是大小除以4.16666 ....

I found that if I pass a scale of 4.166666666666667 I get very close to the actual size of the document, usually within a few millionths of a pixel. 我发现如果我通过4.166666666666667的比例,我会非常接近文档的实际大小,通常在几百万分之一的像素内。


function process(images) {
    //All Images in the array have the same path
    let pdfdoc = images[0].ImageFilePath

    fs.readFile(pdfdoc, (err, imageBuffer) => {
        let u = PDFJSLib.getDocument(imageBuffer)
        images.forEach(img => {
            //if we failed to read the pdf, we need to mark each page for manual review.
            if(err) {
                console.error(err)
                postMessage({height:-1, width:-1, ImageFilePath:img.ImageFilePath, DocId:img.DocId, PageId:img.PageId})
            }
            else {
                u.promise.then(pdf => {
                    pdf.getPage(img.PageNumber).then(data => {
                        console.log(data.getViewport(1).width)
                        console.log(data.getViewport(1).height)
                    })
                });    
            }
        })

    })
}

The output I am expecting is the natural width and height to be logged to the console. 我期望的输出是要记录到控制台的自然宽度和高度。 I need to understand what scale I should be passing in, and what factors determine that scale value. 我需要了解我应该传递的比例,以及决定比例值的因素。 Can I safely pass in 4.166666666666667 and know I'm getting the natural height and width of the page each time? 我可以安全地通过4.166666666666667并知道我每次都获得页面的自然高度和宽度吗?

Other questions I've found relating to this usually have to do with passing the PDF to a viewer -- which I am not doing. 我发现的与此有关的其他问题通常与将PDF传递给观众有关 - 我没有这样做。 Again, my goal is to simply discover the natural height and width of a given PDF page. 同样,我的目标是简单地发现给定PDF页面的自然高度和宽度。

Thanks! 谢谢!

On further review of this issue, I determined that the output page sizes in pixels are assuming a DPI of 72. I can divide the values (612, 792) by 72 then multiply them by 300 to get my expected numbers: 2550 and 3300. 在进一步审查这个问题时,我确定输出页面大小(以像素为单位)假定DPI为72.我可以将值(612,792)除以72然后乘以300得到我预期的数字:2550和3300。

let dimensions = data.getViewport(1).viewBox.map(n => n / 72 * 300)
 //[ 0, 0, 2550, 3300 ]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM