I've got a few thousands lines of text to get particular measurements from. The lines are always in the same format:
'0980 - 14'3 - Plough Yard - London EC2A 3'
'0981 - 14'3 - Waterson St - London E2 8'
'0982 - 14'3 - Union Walk - London E2 8'
'0983 - 14'3 - Union Walk - London E2 8'
'0984 - 14'3 - Hare Row - London E2 9'
'0985 - 14'3 - Sharratt St - London SE15 1'
'0986 - 14'3 - Rolt St - London SE8 5'
'0987 - 14'3 - Edward St - London SE8 5'
Because my knowledge of regex is so poor, the only thing I've come up with is this:
\-(.*?)\-
Which (those of you with a far greater mind for these random strings, can see) will also match on the other sides. All I need is the 14'3
part. I can't garauntee how large the numbers on the far left will get too, could get into the hundreds of thousands.
Update Apparently my pattern string does work after all. The site(s) I was using to build and test it are at fault. Many thanks for all your help!
Try this regex.
^.*?\-(.*?)\-
What this regex does, is it captures only the second occurence of content between -
inside a regex group.
You can be very specific to very general.
This regex is fairly specific:
^'\d+\s+-\s+(\d\d'\d)
This is very general:
(\d+'\d+)
How about:
- (\d+'\d+) -
this will match every 14'3
You could try this regex also,
^'[0-9]+\s*-\s*([^ ]*)
Explanation:
'0980 - 14'3 - Plough Yard - London EC2A 3'
_| | | |
^'[0-9]+| | |
_ _ _ _| | |_____
\s*-\s* | ([^ ]*)
_ _ _ _ |_________
I wanted to point out that your pattern works as is in the .NET regular expression engine without any other options. Here's a demonstration (I've removed the unnecessary backslashes):
var input = @"'0980 - 14'3 - Plough Yard - London EC2A 3'
'0981 - 14'3 - Waterson St - London E2 8'
'0982 - 14'3 - Union Walk - London E2 8'
'0983 - 14'3 - Union Walk - London E2 8'
'0984 - 14'3 - Hare Row - London E2 9'
'0985 - 14'3 - Sharratt St - London SE15 1'
'0986 - 14'3 - Rolt St - London SE8 5'
'0987 - 14'3 - Edward St - London SE8 5'";
foreach(Match m in Regex.Matches(input, "-(.*?)-"))
{
Console.WriteLine(m.Groups[1].Value);
}
This is because .
matches any character except newlines (unless you use 'Single-line' mode to make it also match newlines). As long as none of the lines in your string has another -
after London …
, it will only match the substring between the first pair of -
.
However, for something relatively simple like this, you can use Split
instead:
foreach(var line in input.Split('\n'))
{
Console.WriteLine(line.Split(new[] { '-' }, 3)[1]);
}
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.