简体   繁体   English

在splittag上分割PDF文件/使用itextsharp剪切特殊页面

[英]Split PDF file on splittag / cut out special Page using itextsharp

I want to cut out all pages of a PDF file that contain a special string (splittag). 我想剪切出包含特殊字符串(splittag)的PDF文件的所有页面。 Until now I have this code but it just gives out all pages of the source PDF. 到目前为止,我已经有了这段代码,但是它只给出了源PDF的所有页面。 So whats wrong with it? 那么,这怎么了? I iterate trough the Pages of the source PDF and check if the actual page contains the splittag, then create a new PDF using it for pagenumber. 我遍历源PDF的页面,并检查实际页面是否包含splittag,然后使用它来创建新的PDF作为页码。 Would be great if someone could help. 如果有人可以帮助,那就太好了。 Thank you! 谢谢!

            iTextSharp.text.PdfReader reader = new iTextSharp.text.PdfReader(textBox3.Text);
            string splittag = textBox2.Text;

            StringBuilder text = new StringBuilder();

            for (int i = 1; i <= reader.NumberOfPages; i++)
            {
                if(PdfTextExtractor.GetTextFromPage(reader, i, new SimpleTextExtractionStrategy()).ToString().Contains(splittag)) ;
                {
                    richTextBox1.Text = PdfTextExtractor.GetTextFromPage(reader, i, new SimpleTextExtractionStrategy());
                    Document document = new Document();
                    PdfCopy copy = new PdfCopy(document, new FileStream(textBox5.Text + "\\" + i + ".pdf", FileMode.Create));
                    document.Open();
                    copy.AddPage(copy.GetImportedPage(reader, i));
                    document.Close();
                }                                        
            }

I would use following code: 我将使用以下代码:

public static List<Integer> determineSplits(String fileName) throws FileNotFoundException, IOException
{
   PdfDocument pdfDocument = new PdfDocument(new PdfReader(fileName));
   List<Integer> splitPages = new ArrayList<>();
   for(int i=1;i<=pdfDocument.getNumberOfPages();i++) {
       String pageTxt = PdfTextExtractor.getTextFromPage(pdfDocument.getPage(i));
       if(pageTxt.contains("LoremIpsum"))
       {
           splitPages.add(1);
       }
   }
   pdfDocument.close();
}

This generates a list of pages that need to be included. 这将生成需要包含的页面列表。 Then you can use iText code to separate out the pages you want using 然后,您可以使用iText代码将要使用的页面分开

public List<PdfPage> PdfDocument::copyPagesTo(int pageFrom,
                             int pageTo,
                             PdfDocument toDocument,
                             IPdfPageExtraCopier copier)

I am using this code here now. 我现在在这里使用此代码。 Works fine and is more easy. 工作正常,更容易。

            FileInfo file = new FileInfo(textBox2.Text);

            using (PdfReader reader = new PdfReader(textBox2.Text))
            {

                for (int pagenumber = 1; pagenumber <= reader.NumberOfPages; pagenumber++)
                {
                    string filename = System.IO.Path.GetFileNameWithoutExtension(file.Name);

                    Document document = new Document();                                            

                    if(PdfTextExtractor.GetTextFromPage(reader, pagenumber, new SimpleTextExtractionStrategy()).Contains("LoremIpsum"))
                    {
                        PdfCopy copy = new PdfCopy(document, new FileStream(textBox3.Text + "\\" + filename + pagenumber + ".pdf", FileMode.Create));
                        document.Open();
                        copy.AddPage(copy.GetImportedPage(reader, pagenumber));
                        document.Close();
                    }

                }
            }

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM