[英]Parsing/Reading a PDF Document using iText7 C#
I'm trying to upgrade my code by using iText7 libraries.我正在尝试使用 iText7 库来升级我的代码。 Previously I used iTextSharp libraries But looks like iText7 is totally new I tried Reading a pdf Document but facing an exception in between "Pdf Header Not Found".
以前我使用 iTextSharp 库但看起来 iText7 是全新的我尝试阅读 pdf 文档但在“Pdf Header Not Found”之间遇到异常。 Here's my code
这是我的代码
byte[] bytes = System.Convert.FromBase64String(UploadedFileByes);
MemoryStream memory = new MemoryStream(bytes);
BinaryReader BRreader = new BinaryReader(memory);
StringBuilder text = new StringBuilder();
iText.Kernel.Pdf.PdfReader iTextReader = new iText.Kernel.Pdf.PdfReader(memory);
iText.Kernel.Pdf.PdfDocument pdfDoc = new iText.Kernel.Pdf.PdfDocument(new iText.Kernel.Pdf.PdfReader(memory));
int numberofpages = pdfDoc.GetNumberOfPages();
for (int page = 1; page <= numberofpages; page++) {
iText.Kernel.Pdf.Canvas.Parser.Listener.ITextExtractionStrategy strategy = new iText.Kernel.Pdf.Canvas.Parser.Listener.SimpleTextExtractionStrategy();
string currentText = iText.Kernel.Pdf.Canvas.Parser.PdfTextExtractor.GetTextFromPage(pdfDoc.GetPage(page),strategy);
currentText = Encoding.UTF8.GetString(ASCIIEncoding.Convert(
Encoding.Default, Encoding.UTF8, Encoding.Default.GetBytes(currentText)));
text.Append(currentText);
}
What am I doing wrong?我究竟做错了什么?
I got the solution.我得到了解决方案。 I used the pdfreader that i defined instead of creating new one.
我使用了我定义的 pdfreader,而不是创建新的。 Here's the code.
这是代码。 Hope it would help someone.
希望它会帮助某人。
byte[] bytes = System.Convert.FromBase64String(UploadedFileByes); MemoryStream memory = new MemoryStream(bytes); BinaryReader BRreader = new BinaryReader(memory); StringBuilder text = new StringBuilder(); iText.Kernel.Pdf.PdfReader iTextReader = new iText.Kernel.Pdf.PdfReader(memory); iText.Kernel.Pdf.PdfDocument pdfDoc = new iText.Kernel.Pdf.PdfDocument(iTextReader); int numberofpages = pdfDoc.GetNumberOfPages(); for (int page = 1; page <= numberofpages; page++) { iText.Kernel.Pdf.Canvas.Parser.Listener.ITextExtractionStrategy strategy = new iText.Kernel.Pdf.Canvas.Parser.Listener.SimpleTextExtractionStrategy(); string currentText = iText.Kernel.Pdf.Canvas.Parser.PdfTextExtractor.GetTextFromPage(pdfDoc.GetPage(page),strategy); currentText = Encoding.UTF8.GetString(ASCIIEncoding.Convert( Encoding.Default, Encoding.UTF8, Encoding.Default.GetBytes(currentText))); text.Append(currentText); }
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.