简体   繁体   English

如何使用 ITextSharp 验证 pdf 是基于文本的?

[英]How to verify that pdf is text based using ITextSharp?

I need to verify that the pdf report is text based (and not bitmap based; however it could contain some images).我需要验证 pdf 报告是基于文本的(而不是基于 bitmap;但是它可能包含一些图像)。 I do not need to extract the text, just to verify that it is text based.我不需要提取文本,只是为了验证它是基于文本的。

Is there a way how to perform such a verification using ITextSharp library?有没有办法如何使用 ITextSharp 库执行这样的验证?

Thanks in advance,提前致谢,

Stefan斯特凡

You can look for text drawing commands easily enough.您可以很容易地查找文本绘图命令。 The least work on your part would be to try to extract the text and see if anything is there.您要做的最少的工作是尝试提取文本并查看是否有任何内容。 Ideally you'd know some of the text it should contain and search for it.理想情况下,您应该知道它应该包含的一些文本并搜索它。 A single sentence or phrase would be plenty for this sort of testing.对于这种测试,一个句子或短语就足够了。

Text extraction with iText is pretty trivial these days.如今,使用 iText 进行文本提取非常简单。 Lots of examples floating around SO, and the web. SO和web周围有很多例子。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM