I am using itext7/itextsharp in c#.
how can I implement to search co-ordinates of multiple occurrences of particular word?
Finding the position of search terms (more generally, search expressions) is the job of the iText 7 RegexBasedLocationExtractionStrategy
. It allows you to search for all matches of a regular expression on a page and returns these matches including the exact matched text and its location on the page:
PdfDocument pdfDocument = ...
for (int page = 1; page <= pdfDocument.GetNumberOfPages(); page++)
{
Console.WriteLine("Page {0}", page);
RegexBasedLocationExtractionStrategy strategy = new RegexBasedLocationExtractionStrategy(SEARCH_EXPRESSION);
new PdfCanvasProcessor(strategy).ProcessPageContent(pdfDocument.GetPage(page));
foreach (IPdfTextLocation location in strategy.GetResultantLocations())
{
if (location != null)
{
Rectangle rect = location.GetRectangle();
Console.WriteLine(String.Format(CultureInfo.InvariantCulture, " - '{0}' at ({1}, {2}), {3}\u00d7{4}", location.GetText(), rect.GetX(), rect.GetY(), rect.GetWidth(), rect.GetHeight()));
}
}
}
For example consider the PDF ENaB 20180317.pdf
originally shared by the OP of this question :
There are multiple occurrences of "SAN XXX " with different XXX in those tables. Applying the code above with the regular expression to that file results in:
Page 1
- 'SAN IGNACIO' at (183.6127, 81.85992), 45.49265×9.500404
- 'SAN CERNIN' at (260.1665, 203.9058), 42.94177×9.500397
- 'SAN IGNACIO' at (183.6058, 244.58), 45.49265×9.500397
- 'SAN CERNIN' at (239.0477, 244.58), 42.94247×9.500397
- 'SAN JORGE' at (392.0537, 298.8239), 40.27756×9.500397
- 'SAN CERNIN' at (183.6128, 407.3039), 42.93692×9.500397
- 'SAN IGNACIO' at (183.6058, 434.42), 45.49265×9.500397
- 'SAN IGNACIO' at (392.0432, 434.42), 45.48703×9.500397
Page 2
- 'SAN ADRIAN' at (279.3961, 136.1134), 42.57495×9.500397
- 'SAN CERNIN' at (183.6475, 149.6715), 42.93692×9.500397
- 'SAN CERNIN' at (392.0876, 149.6715), 42.94247×9.500397
- 'SAN CERNIN' at (183.6127, 231.0199), 42.9418×9.500397
- 'SAN CERNIN' at (392.0528, 231.0199), 42.94247×9.500397
- 'SAN IGNACIO' at (308.1654, 312.3896), 45.48703×9.500397
- 'SAN ADRIAN' at (472.0908, 339.5058), 42.57428×9.500397
- 'SAN CERNIN' at (263.1662, 380.18), 42.94247×9.500397
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.