I extract a string from a pdf from that string I need to get a list of tracking numbers.
My extracted string is like this were "more text" is all the rest of the extracted string.
more text...__FREIGHT: 0.00__SALES TAX: 0.00 __602256510000; 602256510002; 602256500001; TRACKING...more text
I locate the tracking numbers in the string by matching on "TRACKING". Here is my Regex:
((?<TrackingNumber>[a-zA-Z0-9]+);\s)+TRACKING
Here's the problem:
after execution the Group TrackingNumber" only contains the last tracking number, but as stated above in need The Group "TrackingNumber" to have 3 matches, one for each tracking number (without the trailing ";" or space)
You may try the below \\G
anchored based regex
(?:;\s|_)(?<TrackingNumber>[a-zA-Z0-9]+)(?=.*?;\s*TRACKING)
The way its done in Dot-Net is to use Capture Collections
edit: - note that you may want to make the tracking chars optional
[a-zA-Z0-9]*
incase there is a missing/blank number mid-stream.
This will continue capturing.
(example: 602256510000; 602256510002;; 602256500001; TRACKING
)
# (?:(?<TrackingNumber>[a-zA-Z0-9]+);\s)+TRACKING
(?:
(?<TrackingNumber> [a-zA-Z0-9]+ ) #_(1)
; \s
)+
TRACKING
C#:
string pdf = "__602256510000; 602256510002; 602256500001; TRACKING ";
Regex RxTrack = new Regex(@"(?:(?<TrackingNumber>[a-zA-Z0-9]+);\s)+TRACKING");
Match trackMatch = RxTrack.Match( pdf );
if ( trackMatch.Success )
{
CaptureCollection cc = trackMatch.Groups["TrackingNumber"].Captures;
for (int i = 0; i < cc.Count; i++)
Console.WriteLine("[{0}] = {1}", i, cc[i].Value);
}
Output:
[0] = 602256510000
[1] = 602256510002
[2] = 602256500001
这个正则表达式可以接受
(?<TrackingNumber>[\d]+)(?=;)
I think this may help you.
(?<TrackingNumber>[0-9]+)(?=.*?;\sTRACKING)
And for better understanding check this : Regular Expression Lookahead
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.