简体   繁体   中英

How to get page content height using pdfbox

Is this possible to get the height of the page content using pdfbox? I think I tried everything but each (PDRectangle) returns full height of the page: 842. First I thought that this is because the page number place at the bottom of the page, but when I opened pdf in Illustrator, the whole content is inside compound element, and isn't extended to the whole page height. So if illustrator can see it as separate element and calculate its height, I guess this should also be possible in pdfbox.

Sample page:

在此处输入图片说明

In general

The PDF specification allows a PDF to provide a number of page boundaries, cf this answer . Aside from them content boundaries may only be derived from page contents, eg from

  • Form XObjects:

    A form XObject is a PDF content stream that is a self-contained description of any sequence of graphics objects (including path objects, text objects, and sampled images). A form XObject may be painted multiple times—either on several pages or at several locations on the same page—and produces the same results each time, subject only to the graphics state at the time it is invoked.

  • Clipping Paths:

    The graphics state shall contain a current clipping path that limits the regions of the page affected by painting operators. The closed subpaths of this path shall define the area that can be painted. Marks falling inside this area shall be applied to the page; those falling outside it shall not be.

  • ...

To find either of them, one has to parse the page content, look for the appropriate operations, and calculate the resulting boundaries.

In the OP's case

Each of your sample PDFs defines explicitly only one page boundary, the MediaBox . Thus, all of the other PDF page boundaries ( CropBox , BleedBox , TrimBox , ArtBox ) default to it. So it is no wonder that in your attempts

each (PDRectangle) returns full height of the page: 842

Neither of them contains form XObjects, but both make use of clipping paths.

  • In case of test-pdf4.pdf:

     Start at: 28.31999969482422, 813.6799926757812 Line to: 565.9199829101562, 813.6799926757812 Line to: 565.9199829101562, 660.2196655273438 Line to: 28.31999969482422, 660.2196655273438 Line to: 28.31999969482422, 813.6799926757812 

    (This might match the sketch in your question.)

  • In case of test-pdf5.pdf:

     Start at: 23.0, 34.0 Line to: 572.0, 34.0 Line to: 572.0, -751.0 Line to: 23.0, -751.0 Line to: 23.0, 34.0 

    and

     Start at: 23.0, 819.0 Line to: 572.0, 819.0 Line to: 572.0, 34.0 Line to: 23.0, 34.0 Line to: 23.0, 819.0 

Due to the match with the sketch I would assume that Illustrator considers everything drawn while a non-trivial clipping path is in effect, a compound element with the clipping path as border.

Finding clipping paths with PDFBox

I used PDFBox to find the clipping paths mentioned above. I used the current SNAPSHOT of the version 2.0.0 now under development as the required APIs have been much improved compared to the current release version 1.8.8.

I extended PDFGraphicsStreamEngine to a ClipPathFinder class:

public class ClipPathFinder extends PDFGraphicsStreamEngine implements Iterable<Path>
{
    public ClipPathFinder(PDPage page)
    {
        super(page);
    }

    //
    // PDFGraphicsStreamEngine overrides
    //
    public void findClipPaths() throws IOException
    {
        processPage(getPage());
    }

    @Override
    public void appendRectangle(Point2D p0, Point2D p1, Point2D p2, Point2D p3) throws IOException
    {
        startPathIfNecessary();
        currentPath.appendRectangle(toFloat(p0), toFloat(p1), toFloat(p2), toFloat(p3));
    }

    @Override
    public void drawImage(PDImage pdImage) throws IOException { }

    @Override
    public void clip(int windingRule) throws IOException
    {
        currentPath.complete(windingRule);
        paths.add(currentPath);
        currentPath = null;
    }

    @Override
    public void moveTo(float x, float y) throws IOException
    {
        startPathIfNecessary();
        currentPath.moveTo(x, y);
    }

    @Override
    public void lineTo(float x, float y) throws IOException
    {
        currentPath.lineTo(x, y);
    }

    @Override
    public void curveTo(float x1, float y1, float x2, float y2, float x3, float y3) throws IOException
    {
        currentPath.curveTo(x1, y1, x2, y2, x3, y3);
    }

    @Override
    public Point2D.Float getCurrentPoint() throws IOException
    {
        return currentPath.getCurrentPoint();
    }

    @Override
    public void closePath() throws IOException
    {
        currentPath.closePath();
    }

    @Override
    public void endPath() throws IOException
    {
        currentPath = null;
    }

    @Override
    public void strokePath() throws IOException
    {
        currentPath = null;
    }

    @Override
    public void fillPath(int windingRule) throws IOException
    {
        currentPath = null;
    }

    @Override
    public void fillAndStrokePath(int windingRule) throws IOException
    {
        currentPath = null;
    }

    @Override
    public void shadingFill(COSName shadingName) throws IOException
    {
        currentPath = null;
    }

    void startPathIfNecessary()
    {
        if (currentPath == null)
            currentPath = new Path();
    }

    Point2D.Float toFloat(Point2D p)
    {
        if (p == null || (p instanceof Point2D.Float))
        {
            return (Point2D.Float)p;
        }
        return new Point2D.Float((float)p.getX(), (float)p.getY());
    }

    //
    // Iterable<Path> implementation
    //
    public Iterator<Path> iterator()
    {
        return paths.iterator();
    }

    Path currentPath = null;
    final List<Path> paths = new ArrayList<Path>();
}

It uses this helper class to represent paths:

public class Path implements Iterable<Path.SubPath>
{
    public static class Segment
    {
        Segment(Point2D.Float start, Point2D.Float end)
        {
            this.start = start;
            this.end = end;
        }

        public Point2D.Float getStart()
        {
            return start;
        }

        public Point2D.Float getEnd()
        {
            return end;
        }

        final Point2D.Float start, end; 
    }

    public class SubPath implements Iterable<Segment>
    {
        public class Line extends Segment
        {
            Line(Point2D.Float start, Point2D.Float end)
            {
                super(start, end);
            }

            //
            // Object override
            //
            @Override
            public String toString()
            {
                StringBuilder builder = new StringBuilder();
                builder.append("    Line to: ")
                       .append(end.getX())
                       .append(", ")
                       .append(end.getY())
                       .append('\n');
                return builder.toString();
            }
        }

        public class Curve extends Segment
        {
            Curve(Point2D.Float start, Point2D.Float control1, Point2D.Float control2, Point2D.Float end)
            {
                super(start, end);
                this.control1 = control1;
                this.control2 = control2;
            }

            public Point2D getControl1()
            {
                return control1;
            }

            public Point2D getControl2()
            {
                return control2;
            }

            //
            // Object override
            //
            @Override
            public String toString()
            {
                StringBuilder builder = new StringBuilder();
                builder.append("    Curve to: ")
                       .append(end.getX())
                       .append(", ")
                       .append(end.getY())
                       .append(" with Control1: ")
                       .append(control1.getX())
                       .append(", ")
                       .append(control1.getY())
                       .append(" and Control2: ")
                       .append(control2.getX())
                       .append(", ")
                       .append(control2.getY())
                       .append('\n');
                return builder.toString();
            }

            final Point2D control1, control2; 
        }

        SubPath(Point2D.Float start)
        {
            this.start = start;
            currentPoint = start;
        }

        public Point2D getStart()
        {
            return start;
        }

        void lineTo(float x, float y)
        {
            Point2D.Float end = new Point2D.Float(x, y);
            segments.add(new Line(currentPoint, end));
            currentPoint = end;
        }

        void curveTo(float x1, float y1, float x2, float y2, float x3, float y3)
        {
            Point2D.Float control1 = new Point2D.Float(x1, y1);
            Point2D.Float control2 = new Point2D.Float(x2, y2);
            Point2D.Float end = new Point2D.Float(x3, y3);
            segments.add(new Curve(currentPoint, control1, control2, end));
            currentPoint = end;
        }

        void closePath()
        {
            closed = true;
            currentPoint = start;
        }

        //
        // Iterable<Segment> implementation
        //
        public Iterator<Segment> iterator()
        {
            return segments.iterator();
        }

        //
        // Object override
        //
        @Override
        public String toString()
        {
            StringBuilder builder = new StringBuilder();
            builder.append("  {\n    Start at: ")
                   .append(start.getX())
                   .append(", ")
                   .append(start.getY())
                   .append('\n');
            for (Segment segment : segments)
                builder.append(segment);
            if (closed)
                builder.append("    Closed\n");
            builder.append("  }\n");
            return builder.toString();
        }

        boolean closed = false;
        final Point2D.Float start;
        final List<Segment> segments = new ArrayList<Path.Segment>();
    }

    public class Rectangle extends SubPath
    {
        Rectangle(Point2D.Float p0, Point2D.Float p1, Point2D.Float p2, Point2D.Float p3)
        {
            super(p0);
            lineTo((float)p1.getX(), (float)p1.getY());
            lineTo((float)p2.getX(), (float)p2.getY());
            lineTo((float)p3.getX(), (float)p3.getY());
            closePath();
        }

        //
        // Object override
        //
        @Override
        public String toString()
        {
            StringBuilder builder = new StringBuilder();
            builder.append("  {\n    Rectangle\n    Start at: ")
                   .append(start.getX())
                   .append(", ")
                   .append(start.getY())
                   .append('\n');
            for (Segment segment : segments)
                builder.append(segment);
            if (closed)
                builder.append("    Closed\n");
            builder.append("  }\n");
            return builder.toString();
        }
    }

    public int getWindingRule()
    {
        return windingRule;
    }

    void complete(int windingRule)
    {
        finishSubPath();
        this.windingRule = windingRule;
    }

    void appendRectangle(Point2D.Float p0, Point2D.Float p1, Point2D.Float p2, Point2D.Float p3) throws IOException
    {
        finishSubPath();
        currentSubPath = new Rectangle(p0, p1, p2, p3);
        finishSubPath();
    }

    void moveTo(float x, float y) throws IOException
    {
        finishSubPath();
        currentSubPath = new SubPath(new Point2D.Float(x, y));
    }

    void lineTo(float x, float y) throws IOException
    {
        currentSubPath.lineTo(x, y);
    }

    void curveTo(float x1, float y1, float x2, float y2, float x3, float y3) throws IOException
    {
        currentSubPath.curveTo(x1, y1, x2, y2, x3, y3);
    }

    Point2D.Float getCurrentPoint() throws IOException
    {
        return currentPoint;
    }

    void closePath() throws IOException
    {
        currentSubPath.closePath();
        finishSubPath();
    }

    void finishSubPath()
    {
        if (currentSubPath != null)
        {
            subPaths.add(currentSubPath);
            currentSubPath = null;
        }
    }

    //
    // Iterable<Path.SubPath> implementation
    //
    public Iterator<SubPath> iterator()
    {
        return subPaths.iterator();
    }

    //
    // Object override
    //
    @Override
    public String toString()
    {
        StringBuilder builder = new StringBuilder();
        builder.append("{\n  Winding: ")
               .append(windingRule)
               .append('\n');
        for (SubPath subPath : subPaths)
            builder.append(subPath);
        builder.append("}\n");
        return builder.toString();
    }

    Point2D.Float currentPoint = null;
    SubPath currentSubPath = null;
    int windingRule = -1;
    final List<SubPath> subPaths = new ArrayList<Path.SubPath>();
}

The class ClipPathFinder is used like this:

PDDocument document = PDDocument.load(PDFRESOURCE, null);
PDPage page = document.getPage(PAGENUMBER);
ClipPathFinder finder = new ClipPathFinder(page);
finder.findClipPaths();

for (Path path : finder)
{
    System.out.println(path);
}

document.close();

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM