简体   繁体   English

使用 iText7 C# 解析/读取 PDF 文档

[英]Parsing/Reading a PDF Document using iText7 C#

I'm trying to upgrade my code by using iText7 libraries.我正在尝试使用 iText7 库来升级我的代码。 Previously I used iTextSharp libraries But looks like iText7 is totally new I tried Reading a pdf Document but facing an exception in between "Pdf Header Not Found".以前我使用 iTextSharp 库但看起来 iText7 是全新的我尝试阅读 pdf 文档但在“Pdf Header Not Found”之间遇到异常。 Here's my code这是我的代码

byte[] bytes = System.Convert.FromBase64String(UploadedFileByes);

MemoryStream memory = new MemoryStream(bytes);
            BinaryReader BRreader = new BinaryReader(memory);
            StringBuilder text = new StringBuilder();


            iText.Kernel.Pdf.PdfReader iTextReader = new iText.Kernel.Pdf.PdfReader(memory);
            iText.Kernel.Pdf.PdfDocument pdfDoc = new iText.Kernel.Pdf.PdfDocument(new iText.Kernel.Pdf.PdfReader(memory));



            int numberofpages = pdfDoc.GetNumberOfPages();
            for (int page = 1; page <= numberofpages; page++) {
                iText.Kernel.Pdf.Canvas.Parser.Listener.ITextExtractionStrategy strategy = new iText.Kernel.Pdf.Canvas.Parser.Listener.SimpleTextExtractionStrategy();
                string currentText = iText.Kernel.Pdf.Canvas.Parser.PdfTextExtractor.GetTextFromPage(pdfDoc.GetPage(page),strategy);
                currentText = Encoding.UTF8.GetString(ASCIIEncoding.Convert(
                    Encoding.Default, Encoding.UTF8, Encoding.Default.GetBytes(currentText)));
                text.Append(currentText);
            }

What am I doing wrong?我究竟做错了什么?

I got the solution.我得到了解决方案。 I used the pdfreader that i defined instead of creating new one.我使用了我定义的 pdfreader,而不是创建新的。 Here's the code.这是代码。 Hope it would help someone.希望它会帮助某人。

 byte[] bytes = System.Convert.FromBase64String(UploadedFileByes); MemoryStream memory = new MemoryStream(bytes); BinaryReader BRreader = new BinaryReader(memory); StringBuilder text = new StringBuilder(); iText.Kernel.Pdf.PdfReader iTextReader = new iText.Kernel.Pdf.PdfReader(memory); iText.Kernel.Pdf.PdfDocument pdfDoc = new iText.Kernel.Pdf.PdfDocument(iTextReader); int numberofpages = pdfDoc.GetNumberOfPages(); for (int page = 1; page <= numberofpages; page++) { iText.Kernel.Pdf.Canvas.Parser.Listener.ITextExtractionStrategy strategy = new iText.Kernel.Pdf.Canvas.Parser.Listener.SimpleTextExtractionStrategy(); string currentText = iText.Kernel.Pdf.Canvas.Parser.PdfTextExtractor.GetTextFromPage(pdfDoc.GetPage(page),strategy); currentText = Encoding.UTF8.GetString(ASCIIEncoding.Convert( Encoding.Default, Encoding.UTF8, Encoding.Default.GetBytes(currentText))); text.Append(currentText); }

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM