简体   繁体   中英

PDF File Import R

I have multiple .pdf-files (stored in a local folder), that contain text. I would like to import the .pdf-files (ie, the texts) in R. I applied the function ' read_dir ' (R package: [textreadr][1] )

library ("textreadr")
Data <- read_dir("<MY PATH>")

The function works well. BUT. For several files, that include special characters (ie, letters) in their names (such as ' ć '; eg, 'filenameć.pdf'), the function did not work (error message: ' The following files failed to read in and were removed: ' …).

What can I do?

I tried to rename the files via R ( did not work (probably due to the same reasons)). That might be a workaround.

I did not want to rename the files manually :)

Follow-Up (only for experts): For several files, I got one of the following error messages (and I have no idea why):

PDF error: Mismatch between font type and embedded font file

or

PDF error: Couldn't find trailer dictionary

Any suggestions or hints how to solve this issue?

Likely the issue concerns the encoding of the file names. If you absolutely want to use R to rename the files for you, the function you want to use is iconv, determine the encoding of the file names and then convert them to utf-8.

However, a much better system would imply renaming them using bash from command line. Can you provide a more complete set of examples?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM