简体   繁体   中英

vb.net Split string with regex

I have requrement to split a string with regex on several rules and I do something with help of previous posts here but I don't know how to do it completelly.

Input string (intentionally written ugly) is:

Berlin "New York"Madrid 'Frankfurt Am Main' Quebec Łódź   München Seattle,Milano

Splitting code is:

Dim subStrings() As String = Regex.Split(myText, """([^""]*)""|,| ")

Result of this is:

0)  
1)  
2)Berlin  
3)  
4)New York  
5)Madrid  
6)'Frankfurt  
7)Am  
8)Main'  
9)Quebec  
10)Łódź  
11)  
12)  
13)München  
14)Seattle  
15)Milano  

In short, string should be splitted into array by " " (space) and/or "," char and/or by single or double quote. Quoted terms should be treated as a single word. This means that term in single quotes (at place 6) will be treated a same like a term in double quotes. That way 'Frankfurt Am Main' at place 6. will be "one word" same as is "New York" at place 4. Also, I would like if regex can be made that empty matches would not go to subStrings() array. After all an ideal result from given example should be:

0)Berlin  
1)New York  
2)Madrid  
3)Frankfurt Am Main  
4)Quebec  
5)Łódź  
6)München  
7)Seattle  
8)Milano  

So, please if someone know how to solve this concrete regex for me.

You may extract the strings by using Regex.Matches with the following regex:

"([^"]*)"|'([^']*)'|([^,\s]+)

See the regex demo .

Details

  • "([^"]*)" - " , then Group 1 matching any 0+ chars other than " , and then "
  • | - or
  • '([^']*)' - ' , then Group 2 matching any 0+ chars other than ' , and then '
  • | - or
  • ([^,\\s]+) - Group 3: any 1+ chars other than , and whitespace

VB.NET code snippet:

Dim text = "Berlin ""New York""Madrid 'Frankfurt Am Main' Quebec Łódź   München Seattle,Milano"
Dim pattern As String = """([^""]*)""|'([^']*)'|([^,\s]+)"
Dim matches() As String = Regex.Matches(text, pattern) _
          .Cast(Of Match)() _
          .Select(Function(m) m.Groups(1).Value & m.Groups(2).Value & m.Groups(3).Value) _
          .ToArray()

Results:

在此处输入图片说明

The same can be obtained with the following Regex.Split approach:

pattern = """([^""]*)""|'([^']*)'|[,\s]+"
Dim matches() As String = Regex.Split(text, pattern).Where(Function(m) Not String.IsNullOrWhiteSpace(m)).ToArray()

See the regex demo .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM