简体   繁体   中英

Regex for multiline string pattern

I am creating regex for multiline string pattern but it's not work. this is my input pattern.

FXP/R,U

1.NWAMNKPA/UGONMA D 2.NWAMNKPA/AMAJINDI O
3.NWAMNKPA/AMAJINDI NA 4.NWAMNKPA/ADAUGOAMAJI C
5.NWAMNKPA/CHINAZAEKPERE N

Regular expression:

(FXP\\S{3,20})|(\\r\\s{3}.\\S+(.+))

but it's not take this line:

3.NWAMNKPA/AMAJINDI NA 4.NWAMNKPA/ADAUGOAMAJI C

it's take only this two only :

1.NWAMNKPA/UGONMA D 2.NWAMNKPA/AMAJINDI O
5.NWAMNKPA/CHINAZAEKPERE N

Desired o/p :-

  1. NWAMNKPA/UGONMA D
  2. NWAMNKPA/AMAJINDI O
  3. NWAMNKPA/AMAJINDI NA
  4. NWAMNKPA/ADAUGOAMAJI C
  5. NWAMNKPA/CHINAZAEKPERE N

You can look into RegexOptions.MultiLine (and other options). ( http://msdn.microsoft.com/en-us/library/yd1hzczs.aspx )

I would advise you to use String.Split() instead and validate a line at a time. Regular expressions are hard readable and there is no need to match a pattern over more lines. It makes you code easier to understand.

I don't think your regular expression is doing what you think it's doing. The first part is ok, but the second part, \\r\\s{3}.\\S+(.+) , is looking for a carriage return, followed by exactly three whitespace characters, followed by any one character (whitespace or not), followed by any number of non-whitespace characters, followed by any number of characters which you capture.

There are a number of issues with this. First of all, not all text has carriage returns ( \\r ) - checking for a newline ( \\n ) instead is much safer. Even if your text does have \\r , there's almost certainly going to be a \\n afterwards (Windows ends lines with \\r\\n ). The \\n might be absorbed into the \\s{3} , depending on your data, though.

Secondly, + is a greedy operator. That means that the first + in \\S+(.+) will match everything it can - in other words, all non-whitespace characters until it reaches a whitespace. Only after finding a whitespace will the (.+) start capturing, and the first character it has will be whitespace. Alternatively, if there is no whitespace left in the string, the \\S+ will "give back" one character so that the .+ has something to match, in which case it will simply be the last character of the string.

All things considered, I think you're going to be much better off with something simpler, like this:

RegEx.Split(myData, @"(?=\d)").Where(s => !string.IsNullOrEmpty(s))

That will split your data every time the next character is a number.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM