简体   繁体   中英

Search for Text a PDF - double results

i have a question about search text in a PDF file in attach here: pdf shared link google drive . If I search text example "1500", I see 4 occurences but there are only 2 occurenes in page 2.....the same if I search text "musei" find 2 occurrences, but this text is only in page 1.

The research parse the single page and find all document text in every single page, because I have double results.

Can anyone explain why happen this? Did this PDF file generated in a particular way respect other where searching text is ok?

Thanks a lot

That PDF is indeed special, each page contains the text of both pages. On the first page the text from the second page is right of the right page border, and on the second page the text from the first page is left of the left page border. Furthermore, the contents of the respectively other page are additionally outside the clip area.

I enlarged the page boxes (media box, crop box, ...) of the first page to the right and of the second page to the left, and then marked all text ( Ctrl-A ) to show even the text outside the clip area, and you see:


For text extraction that only extracts the text in the visible areas, you should restrict your text extraction routine to the crop box of the respective page.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM