简体   繁体   中英

Installing pdftotext on Windows (for use with R, 'tm' package)

I am having trouble using R, 'tm' package, to read in .pdf files. Specifically, I try to run the following code:

library(tm)
filename = "myfile.pdf"

tmp1 <- readPDF(PdftotextOptions="-layout")
doc <- tmp1(elem=list(uri=filename),language="en",id="id1")
doc[1:15]

...which gives me the error:

Error in readPDF(PdftotextOptions = "-layout") : 
  unused argument (PdftotextOptions = "-layout")

I assume this is due to the fact that the pdftotext program (part of xpdf, http://www.foolabs.com/xpdf/download.html ) has not been installed correctly on my machine, so that R cannot access it.

What are the steps to install xpdf/pdftotext correctly such that the above R code can be executed? (I am aware of similar questions already posted, however they don't address the same issue)

PdftotextOptions is no parameter of readPDF . readPDF has a control parameter, which expects a list. So correct use would be:

if(all(file.exists(Sys.which(c("pdfinfo", "pdftotext"))))) { 
  tmp1 <- readPDF(control = list(text = "-layout"))
  doc <- tmp1(elem=list(uri=filename),language="en",id="id1")
}

Set

setwd('C:/xpdf/bin64')                 

It works for me.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM