简体   繁体   English

如何判断 PDF 文本是否已旋转

[英]how to tell if PDF text has been rotated

I'm trying to pull text from a PDF using iText7.我正在尝试使用 iText7 从 PDF 中提取文本。 I'm using the IEventListener to get all the parts of the page, though some of the text is rotated.我正在使用 IEventListener 来获取页面的所有部分,尽管某些文本是旋转的。 I can find examples for how to insert rotated text into a PDF, but can't find anything about how I can tell if a given text segment is rotated.我可以找到有关如何将旋转文本插入 PDF 的示例,但找不到任何关于如何判断给定文本段是否旋转的信息。

Can anyone help?任何人都可以帮忙吗?

public void EventOccurred(IEventData data, EventType type)
{
    PdfPart part = null;

    switch (type)
    {
        case EventType.BEGIN_TEXT:
            break;
        case EventType.RENDER_TEXT:
            part = new PdfTextPart(PageNumber, data as TextRenderInfo);
            Parts.Add(part);
            break;
        case EventType.END_TEXT:
            break;
        case EventType.RENDER_IMAGE:
            var imageData = data as ImageRenderInfo;
            //this.HandleImage(imageData);
            break;
        case EventType.RENDER_PATH:
            part = new PdfLinePart(PageNumber, data as PathRenderInfo);
            Parts.Add(part);
            break;
        case EventType.CLIP_PATH_CHANGED:
            break;
        default:
            break;
    }
}
public PdfTextPart(Int32 pageNumber, TextRenderInfo info) : base(pageNumber)
{
    Text = info.GetText();

    var font = info.GetFont().GetFontProgram().GetFontNames();
    Font = font.GetFontName();

    if (font.IsItalic()) { this.IsItalic = true; }
    if (font.IsBold()) { this.IsBold = true; }
    if (font.IsUnderline()) { this.IsUnderline = true; }
}

TextRenderInfo has a base line. TextRenderInfo有一个基线。 This base line is a LineSegment and as such has a start point and an end point.该基线是LineSegment ,因此具有起点和终点。 Now you merely have to determine the angle of the line between those two points.现在你只需要确定这两点之间的线的角度。

Ie for a TextRenderInfo info :TextRenderInfo info

LineSegment baseline = info.GetBaseline();
Vector startPoint = baseline.GetStartPoint();
Vector endPoint = baseline.GetEndPoint();
Vector direction = endLocation.Subtract(startLocation);
double angle = Math.Atan2(direction.Get(Vector.I2), direction.Get(Vector.I1));

The result obviously is in radian measure.结果显然是以弧度为单位的。

You may additionally have to take into account the page rotation which (if I recall correctly) is not calculated into the coordinates above.您可能还必须考虑页面旋转(如果我没记错的话)没有计算到上面的坐标中。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM