简体   繁体   中英

Compression of Splited PDF Files

How to compress a sliced pdf documents in c#..??

i have a pdf document. i am slicing that document. if the orginal pdf document size 10 mb after slicing size is increasing to 15 mb. thats why i have to compress the sliced document. is any way to compress..?? please help me..

public int ExtractPages(string sourcePdfPath, string DestinationFolder)
        {
            int p = 0, initialcount = 0;
            try
            {
                iTextSharp.text.Document document;
                iTextSharp.text.pdf.PdfReader reader = new iTextSharp.text.pdf.PdfReader(new iTextSharp.text.pdf.RandomAccessFileOrArray(sourcePdfPath), new ASCIIEncoding().GetBytes(""));

            if (!Directory.Exists(DestinationFolder))
            {
                Directory.CreateDirectory(DestinationFolder);
            }
            else
            {
                DirectoryInfo di = new DirectoryInfo(DestinationFolder);
                initialcount = di.GetFiles("*.pdf", SearchOption.AllDirectories).Length;
            }

            for (p = 1; p <= reader.NumberOfPages; p++)
            {
                using (MemoryStream memoryStream = new MemoryStream())
                {
                    document = new iTextSharp.text.Document();
                    iTextSharp.text.pdf.PdfWriter writer = iTextSharp.text.pdf.PdfWriter.GetInstance(document, memoryStream);
                    writer.SetPdfVersion(iTextSharp.text.pdf.PdfWriter.PDF_VERSION_1_2);
                    writer.CompressionLevel = iTextSharp.text.pdf.PdfStream.BEST_COMPRESSION;
                    writer.SetFullCompression();
                    document.SetPageSize(reader.GetPageSize(p));
                    document.NewPage();
                    document.Open();
                    document.AddDocListener(writer);
                    iTextSharp.text.pdf.PdfContentByte cb = writer.DirectContent;
                    iTextSharp.text.pdf.PdfImportedPage pageImport = writer.GetImportedPage(reader, p);
                    int rot = reader.GetPageRotation(p);
                    if (rot == 90 || rot == 270)
                    {
                        cb.AddTemplate(pageImport, 0, -1.0F, 1.0F, 0, 0, reader.GetPageSizeWithRotation(p).Height);
                    }
                    else
                    {
                        cb.AddTemplate(pageImport, 1.0F, 0, 0, 1.0F, 0, 0);
                    }
                    document.Close();
                    document.Dispose();
                    File.WriteAllBytes(DestinationFolder + "/" + p + ".pdf", memoryStream.ToArray());
                }
            }
            reader.Close();
            reader.Dispose();
        }
        catch
        {
        }
        finally
        {
            GC.Collect();
        }



        if (initialcount > (p - 1))
        {
            for (int k = (p - 1) + 1; k <= initialcount; k++)
            {
                try
                {
                    File.Delete(DestinationFolder + "/" + k + ".pdf");
                }
                catch
                {
                }
            }
        }

        return p - 1;
    }

First of all you should not use PdfWriter with GetImportedPage and its direct content with AddTemplate for a task like that at hand. Instead have a look at the Webified iTextSharp Examples of iText in Action — 2nd Edition .

There you'll find the sample Burst.cs with the central code

PdfReader reader = new PdfReader(pdf);
// loop over all the pages in the original PDF
int n = reader.NumberOfPages;      
for (int i = 0; i < n; i++)
{
    using (MemoryStream ms = new MemoryStream())
    {
        // We'll create as many new PDFs as there are pages
        using (Document document = new Document())
        {
            using (PdfCopy copy = new PdfCopy(document, ms))
            {
                document.Open();
                copy.AddPage(copy.GetImportedPage(reader, i + 1));
            }
        }
        // store ms.ToArray() somewhere
    }
}

(I removed some ZIP file packing those webified samples use.)

As you see, no need anymore to deal with page rotations or anything.

Now this all being said, the sum of the sizes of the individual files will very likely be larger than the size of the original file. After all, in the original file resources could be shared. E,g, a font used on all pages only needed to be embedded once while in the split documents the font has to be embedded in each individual document with a page on which that font is used.

PS: If keeping meta information is important, you might want to use PdfReader.selectPages and PdfStamper instead. For this I only have Java code:

for (int i = 1; i <= TEST_FILE_PAGES; i++)
{
    FileOutputStream fos = new FileOutputStream(String.format("%03d.pdf", i));
    PdfReader reader = new PdfReader(TEST_FILE);
    reader.selectPages(Collections.singletonList(i));
    PdfStamper stamper = new PdfStamper(reader, fos);
    stamper.close();
    fos.close();
}

This keeps the PDF meta information and, therefore, might be more apropos depending on your requirements. It is much slower, though, as for each page export the PdfReader contents are manipulated and, therefore, have to be re-read for exporting the next page.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM