install.packages("pdftools")
library("pdftools")
pdf.file <- "https://eparlib.nic.in/bitstream/123456789/809853/1/pms_16_17_07-02-2019_eng.pdf"
setwd("D:/Assignment 1/")
download.file(pdf.file, destfile = "speech1.pdf", mode = "wb")
pdf.text <- pdftools::pdf_text("speech1.pdf")
cat(pdf.text[[2]])
typeof(pdf.text)
I want to read the text as strings instead of characters. I was not able to find the ways to read it as strings instead it always ended up being read as characters.
The return value of pdftools::pdf_text is not a list, but a simple character:
> library(pdftools)
> x <- pdf_text("bla.pdf")
> typeof(x)
[1] "character"
You thus cannot use the index operator [[i]]
on the return value. If you want to extract individual cahracters, you must use substr :
> substr(x, 3, 3)
[1] "l"
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.