简体   繁体   中英

c# itext7/itextsharp : how to find co-ordinates of particular term in PDF file?

I am using itext7/itextsharp in c#.

how can I implement to search co-ordinates of multiple occurrences of particular word?

Finding the position of search terms (more generally, search expressions) is the job of the iText 7 RegexBasedLocationExtractionStrategy . It allows you to search for all matches of a regular expression on a page and returns these matches including the exact matched text and its location on the page:

PdfDocument pdfDocument = ...

for (int page = 1; page <= pdfDocument.GetNumberOfPages(); page++)
{
    Console.WriteLine("Page {0}", page);
    RegexBasedLocationExtractionStrategy strategy = new RegexBasedLocationExtractionStrategy(SEARCH_EXPRESSION);
    new PdfCanvasProcessor(strategy).ProcessPageContent(pdfDocument.GetPage(page));
    foreach (IPdfTextLocation location in strategy.GetResultantLocations())
    {
        if (location != null)
        {
            Rectangle rect = location.GetRectangle();
            Console.WriteLine(String.Format(CultureInfo.InvariantCulture, " - '{0}' at ({1}, {2}), {3}\u00d7{4}", location.GetText(), rect.GetX(), rect.GetY(), rect.GetWidth(), rect.GetHeight()));
        }
    }
}

For example consider the PDF ENaB 20180317.pdf originally shared by the OP of this question :

ENaB 20180317_Page_1.png ENaB 20180317_Page_2.png

There are multiple occurrences of "SAN XXX " with different XXX in those tables. Applying the code above with the regular expression to that file results in:

Page 1
 - 'SAN IGNACIO' at (183.6127, 81.85992), 45.49265×9.500404
 - 'SAN CERNIN' at (260.1665, 203.9058), 42.94177×9.500397
 - 'SAN IGNACIO' at (183.6058, 244.58), 45.49265×9.500397
 - 'SAN CERNIN' at (239.0477, 244.58), 42.94247×9.500397
 - 'SAN JORGE' at (392.0537, 298.8239), 40.27756×9.500397
 - 'SAN CERNIN' at (183.6128, 407.3039), 42.93692×9.500397
 - 'SAN IGNACIO' at (183.6058, 434.42), 45.49265×9.500397
 - 'SAN IGNACIO' at (392.0432, 434.42), 45.48703×9.500397
Page 2
 - 'SAN ADRIAN' at (279.3961, 136.1134), 42.57495×9.500397
 - 'SAN CERNIN' at (183.6475, 149.6715), 42.93692×9.500397
 - 'SAN CERNIN' at (392.0876, 149.6715), 42.94247×9.500397
 - 'SAN CERNIN' at (183.6127, 231.0199), 42.9418×9.500397
 - 'SAN CERNIN' at (392.0528, 231.0199), 42.94247×9.500397
 - 'SAN IGNACIO' at (308.1654, 312.3896), 45.48703×9.500397
 - 'SAN ADRIAN' at (472.0908, 339.5058), 42.57428×9.500397
 - 'SAN CERNIN' at (263.1662, 380.18), 42.94247×9.500397

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM