簡體   English   中英

c# itext7/itextsharp:如何在 PDF 文件中找到特定術語的坐標?

[英]c# itext7/itextsharp : how to find co-ordinates of particular term in PDF file?

我在 c# 中使用 itext7/itextsharp。

如何實現搜索特定單詞多次出現的坐標?

查找搜索詞(更一般地說,搜索表達式)的位置是 iText 7 RegexBasedLocationExtractionStrategy 它允許您在頁面上搜索正則表達式的所有匹配項,並返回這些匹配項,包括完全匹配的文本及其在頁面上的位置:

PdfDocument pdfDocument = ...

for (int page = 1; page <= pdfDocument.GetNumberOfPages(); page++)
{
    Console.WriteLine("Page {0}", page);
    RegexBasedLocationExtractionStrategy strategy = new RegexBasedLocationExtractionStrategy(SEARCH_EXPRESSION);
    new PdfCanvasProcessor(strategy).ProcessPageContent(pdfDocument.GetPage(page));
    foreach (IPdfTextLocation location in strategy.GetResultantLocations())
    {
        if (location != null)
        {
            Rectangle rect = location.GetRectangle();
            Console.WriteLine(String.Format(CultureInfo.InvariantCulture, " - '{0}' at ({1}, {2}), {3}\u00d7{4}", location.GetText(), rect.GetX(), rect.GetY(), rect.GetWidth(), rect.GetHeight()));
        }
    }
}

例如,考慮最初由這個問題的 OP 共享的 PDF ENaB 20180317.pdf

ENaB 20180317_Page_1.png ENaB 20180317_Page_2.png

在這些表中多次出現具有不同XXX的“SAN XXX ”。 將帶有正則表達式的上述代碼應用於該文件會導致:

Page 1
 - 'SAN IGNACIO' at (183.6127, 81.85992), 45.49265×9.500404
 - 'SAN CERNIN' at (260.1665, 203.9058), 42.94177×9.500397
 - 'SAN IGNACIO' at (183.6058, 244.58), 45.49265×9.500397
 - 'SAN CERNIN' at (239.0477, 244.58), 42.94247×9.500397
 - 'SAN JORGE' at (392.0537, 298.8239), 40.27756×9.500397
 - 'SAN CERNIN' at (183.6128, 407.3039), 42.93692×9.500397
 - 'SAN IGNACIO' at (183.6058, 434.42), 45.49265×9.500397
 - 'SAN IGNACIO' at (392.0432, 434.42), 45.48703×9.500397
Page 2
 - 'SAN ADRIAN' at (279.3961, 136.1134), 42.57495×9.500397
 - 'SAN CERNIN' at (183.6475, 149.6715), 42.93692×9.500397
 - 'SAN CERNIN' at (392.0876, 149.6715), 42.94247×9.500397
 - 'SAN CERNIN' at (183.6127, 231.0199), 42.9418×9.500397
 - 'SAN CERNIN' at (392.0528, 231.0199), 42.94247×9.500397
 - 'SAN IGNACIO' at (308.1654, 312.3896), 45.48703×9.500397
 - 'SAN ADRIAN' at (472.0908, 339.5058), 42.57428×9.500397
 - 'SAN CERNIN' at (263.1662, 380.18), 42.94247×9.500397

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM