[英]How to get total number of pages of pdf files using magick::image_read_pdf?
Let's say under one folder main_path
, we have multiple pdf files with different amount of pages, I use the function below to loop all files and screenshot each pages:假设在一个文件夹
main_path
下,我们有多个不同页数的 pdf 文件,我使用下面的 function 循环所有文件并截图每个页面:
library(magick)
library(glue)
main_path <- './'
file_names <- list.files(path = main_path, pattern ='.pdf')
file_paths <- file.path(main_path, file_names)
file_names_no_ext <- tools::file_path_sans_ext(file_names)
max_page <- 10
pdf2plot <- function(file_path, file_names_no_ext){
pages <- magick::image_read_pdf(file_path)
print(pages)
num <- seq(1, max_page, 1)
# num <- seq(1, nrow(data.frame(pages)), 1)
for (i in num){
pages[i] %>% image_write(., path = paste0(glue(main_path, '/plot/', {file_names_no_ext},
sprintf('_%02d.', i)), format = "png"))
}
}
mapply(pdf2plot, file_paths, file_names_no_ext)
The problem I met is if we have one file in folder with total number of pages less than max_page
, it will raise an Error in magick_image_subset(x, i): subscript out of bounds
.我遇到的问题是,如果文件夹中有一个文件的总页数小于
max_page
,它将Error in magick_image_subset(x, i): subscript out of bounds
。 For example, I have one file with 2 pages, but I set max_page=10
, I will get this error.例如,我有一个有 2 页的文件,但我设置
max_page=10
,我会得到这个错误。
The content of pages
: pages
内容:
format width height colorspace matte filesize density
<chr> <int> <int> <chr> <lgl> <int> <chr>
1 PNG 2250 3000 sRGB TRUE 0 300x300
2 PNG 2250 3000 sRGB TRUE 0 300x300
3 PNG 2250 3000 sRGB TRUE 0 300x300
4 PNG 2250 3000 sRGB TRUE 0 300x300
5 PNG 2250 3000 sRGB TRUE 0 300x300
6 PNG 2250 3000 sRGB TRUE 0 300x300
7 PNG 2250 3000 sRGB TRUE 0 300x300
8 PNG 2250 3000 sRGB TRUE 0 300x300
9 PNG 2250 3000 sRGB TRUE 0 300x300
Error in magick_image_subset(x, i) : subscript out of bounds
Called from: magick_image_subset(x, i)
I think there could be two ways to solve this problem, but I don't how to do it yet: 1. use try-catch
, 2. replace max_page
by get total number of pages using magick::image_read_pdf
.我认为可能有两种方法可以解决这个问题,但我还不知道如何去做:1. 使用
try-catch
, 2. 通过使用magick::image_read_pdf
获取总页数替换max_page
。
Thanks for your help at advance.提前感谢您的帮助。
If you look at the documentation of ?image_read
, we can see that:如果您查看
?image_read
的文档,我们可以看到:
All standard base vector methods such as [, [[, c(), as.list(), as.raster(), rev(), length(), and print() can be used to work with magick image objects.
所有标准的基本向量方法,例如 [、[[、c()、as.list()、as.raster()、rev()、length() 和 print(),都可以用于处理魔法图像对象。 Use the standard img[i] syntax to extract a subset of the frames from an image.
使用标准 img[i] 语法从图像中提取帧的子集。
So you can simply use length(pages)
to get the number of pages for that document.因此,您可以简单地使用
length(pages)
来获取该文档的页数。 Here's a simple version of your function using lapply()
.这是使用
lapply()
的 function 的简单版本。 I think you can simplify your pathing a lot, but won't get into that.我认为你可以简化你的路径很多,但不会进入那个。
library(magick)
library(glue)
pdf2plot <- function(file_path, file_names_no_ext){
pages <- magick::image_read_pdf(file_path)
lapply(
1:length(pages),
\(i) image_write(pages[i], path = paste0(glue(main_path, '/plot/', {file_names_no_ext},
sprintf('_%02d.', i)), format = "png"))
)
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.