Strange 1 byte character result with pdftotext from .pdf to .txt

Question

I have this weird result when transferring a single pdf with no content to a.txt file.

I am using this PHP code in a foreach for all the files found in the dir. It works ridiculously well with the -raw option if there is text available in the pdf.

system("pdftotext -raw $page_name 2>&1");

However, if there is no content, or the file just contains an image, it produces this code in the.txt file:

(view of Line 1 in the.txt file)

I've tried multiple pdftotext-settings, but can't seem to get rid of it.

Is there any way to tackle this with pdftotext?

Some further info: with that character, the file produced is always 1 byte. Where I'd like to have it listed as 0 bytes in the dir.

(ps. first time use of adding an image. Hope it is clear!)

Answer 1

Because of what I just (finally) found, I will close this one with this best answer from @mkl. In Bold is the answer to this question:

More exactly, that Worksheet PDF does not contain text drawing instructions, merely graphics drawing instructions (the results of which look like text) .

pdfminer pdf2text outputs 'FF'

The solution is reading that weird character when working with files that have this content.

Strange 1 byte character result with pdftotext from .pdf to .txt

Question

1 answers

solution1
0 2021-12-12 22:10:26

Strange 1 byte character result with pdftotext from .pdf to .txt

Question

1 answers

solution1 0 2021-12-12 22:10:26

solution1
0 2021-12-12 22:10:26