Match until the next literal expression

Question

Suppose I have this string to match:

Name XXXXAddress XXXXX XXXX City XXXXX

Where X are a non specified number of characters, symbols, numbers or newline characters and spaces. I usually do this:

Name (.*?)Address (.*?) (.*?) City (.*?)

But as you can see. Between the third match and the City literal, there is a SPACE char. So if the second match contains an address that have 1 or more spaces too as obvious the engine will do this:

Ex: if address is: pushkina road 10, I'll have a second match equal to "pushkina"

That's not wrong but not sufficient for me. I want to instruct the engine to consider the sequence of characters nearer to City to have priority over the first one...so if the first block or even the second one have spaces in it to not consider those and just skip in such ways.

Is this possible? I use the .NET flavor.

Answer 1

I tested the regex with Ruby thanks to Rubular and I guess it will work in .NET too.

^Name ((?:.(?!Address))*) Address ((?:.(?!\S+ City))*) (\S+) City (.*)$

And I used the following test string:

Name XX X X Address XX X X X YYYYYY City Z Z Z Z

Assumptions: the sequence of Y's doesn't contains any space.

The sub-expression ((?:.(?!Address))*) allows us to match any sequence of characters provided any of those character is not followed by Address . The sub-expression ((?:.(?!\\S+ City))*) does the same providied the characters are not followed by \\S+ City ie a sequence of non space characters followed by City. The sub-regexes use the negative lookahead operator (?!...) .

Match until the next literal expression

Question

1 answers

solution1
0 2011-12-17 06:51:08

Match until the next literal expression

Question

1 answers

solution1 0 2011-12-17 06:51:08

solution1
0 2011-12-17 06:51:08