简体   繁体   中英

When reading in pdf text using readtext is there a way to ensure that readtext respects columns?

The problem is that I have a PDF document formatted in landscape with three columns of text which I am attempting to read into R using readtext(). When it reads the text in, rather than reading down each column in order, it is reading between columns across the same line of text.

To describe it simply, if the first line of each column was just a string of numbers from 1-10 and the second was a string from 11-20 then readtext() reads it in as "1234567891012345678910" rather than as "1234567891011121314..." etc.

Is there a way to specify that readtext() follows columns in my importing process?

Best, Daniel

The (current) answer is no. readtext uses the pdftools package to read the pdfs and this doesn't recognize the seperate columns. This has something to do with poppler that is being used to read pdfs. See also issue 4 on github. It is sort of in pdf_data but not easy to retrieve.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM