[英]Using R, how can someone count the number of pages in a PDF file?
I have about a hundred long PDF files in a directory and would like to know whether R can count how many pages are in each file. 我在目录中有大约一百个长PDF文件,并想知道R是否可以计算每个文件中有多少页面。 My operating system is Windows 8.
我的操作系统是Windows 8。
Here is the link to a 10-page PDF file, in case this helps you test your solution. 以下是10页PDF文件的链接,以防这有助于您测试解决方案。 MWE pdf file
MWE pdf文件
It appears to be possible to count PDF pages with python, but I don't know that language python solution . 似乎可以使用python计算PDF页面,但我不知道语言python解决方案 。 Other solutions have been discussed on SO using, eg, Imagemagick.
已经使用例如Imagemagick在SO上讨论了其他解决方案。 and C##.
和C ##。
I'm working on a Windows 7 machine, but my experiences on Windows 8 make me think it should work just as well for you. 我正在使用Windows 7机器,但我在Windows 8上的经验让我觉得它应该对你有用。
I wasn't able to compile the Rpoppler
package, and as hrbrmstr points out, it's probably not worth fighting. 我无法编译
Rpoppler
包,正如hrbrmstr指出的那样,它可能不值得战斗。 If you have 7-Zip, you can extract the poppler tools for Windows. 如果你有7-Zip,你可以提取Windows的poppler工具。 I've extracted them to the location
C:\\poppler
. 我已将它们提取到位置
C:\\poppler
。 Once there, I can do the following 到那里,我可以做到以下几点
file_name <- "C:/[file_path]/whitepaper-pdfprimer.pdf"
pdf_pages <- function(file_name){
require(magrittr)
pages <- system2("C:/poppler/bin/pdfinfo.exe",
args = file_name,
stdout = TRUE)
pages[grepl("Pages:", pages)] %>%
gsub("Pages:", "", .) %>%
as.numeric()
}
pdf_pages(file_name)
And if you have a vector of file names you want to pass 如果你有一个你希望传递的文件名向量
vapply(file_names, pdf_pages, numeric(1))
Credit to @hrbrmstr for pointing out the poppler tools (I'd never heard of them until today). 感谢@hrbrmstr指出了poppler工具(我直到今天才听说过它们)。
On R version 3.3.2 pdftools
works: 在R版本3.3.2上
pdftools
工作原理:
library(pdftools)
pdfInfo <- pdf_info(<path to PDF file>)
pdfInfo$pages
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.