简体   繁体   中英

Matching a string between known characters

I've got a few thousands lines of text to get particular measurements from. The lines are always in the same format:

'0980 - 14'3 - Plough Yard - London EC2A 3'
'0981 - 14'3 - Waterson St - London E2 8'
'0982 - 14'3 - Union Walk - London E2 8'
'0983 - 14'3 - Union Walk - London E2 8'
'0984 - 14'3 - Hare Row - London E2 9'
'0985 - 14'3 - Sharratt St - London SE15 1'
'0986 - 14'3 - Rolt St - London SE8 5'
'0987 - 14'3 - Edward St - London SE8 5'

Because my knowledge of regex is so poor, the only thing I've come up with is this:

\-(.*?)\-

Which (those of you with a far greater mind for these random strings, can see) will also match on the other sides. All I need is the 14'3 part. I can't garauntee how large the numbers on the far left will get too, could get into the hundreds of thousands.

Update Apparently my pattern string does work after all. The site(s) I was using to build and test it are at fault. Many thanks for all your help!

Try this regex.

^.*?\-(.*?)\-

What this regex does, is it captures only the second occurence of content between - inside a regex group.

http://rubular.com/r/wAxtbQT4wb

You can be very specific to very general.

This regex is fairly specific:

^'\d+\s+-\s+(\d\d'\d)

See it work

This is very general:

(\d+'\d+)

See that work

How about:

- (\d+'\d+) - 

this will match every 14'3

You could try this regex also,

^'[0-9]+\s*-\s*([^ ]*)

DEMO

Explanation:

    '0980 - 14'3 - Plough Yard - London EC2A 3'
   _|   |   |  |
^'[0-9]+|   |  |
_ _  _ _|   |  |_____
\s*-\s*     |  ([^ ]*)
   _ _ _ _  |_________

I wanted to point out that your pattern works as is in the .NET regular expression engine without any other options. Here's a demonstration (I've removed the unnecessary backslashes):

var input = @"'0980 - 14'3 - Plough Yard - London EC2A 3'
'0981 - 14'3 - Waterson St - London E2 8'
'0982 - 14'3 - Union Walk - London E2 8'
'0983 - 14'3 - Union Walk - London E2 8'
'0984 - 14'3 - Hare Row - London E2 9'
'0985 - 14'3 - Sharratt St - London SE15 1'
'0986 - 14'3 - Rolt St - London SE8 5'
'0987 - 14'3 - Edward St - London SE8 5'";

foreach(Match m in Regex.Matches(input, "-(.*?)-")) 
{
    Console.WriteLine(m.Groups[1].Value);
}

This is because . matches any character except newlines (unless you use 'Single-line' mode to make it also match newlines). As long as none of the lines in your string has another - after London … , it will only match the substring between the first pair of - .

However, for something relatively simple like this, you can use Split instead:

foreach(var line in input.Split('\n')) 
{
    Console.WriteLine(line.Split(new[] { '-' }, 3)[1]);
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM