[英]vb.net Split string with regex
I have requrement to split a string with regex on several rules and I do something with help of previous posts here but I don't know how to do it completelly.我需要在几个规则上用正则表达式分割一个字符串,我在以前的帖子的帮助下做了一些事情,但我不知道如何完全做到这一点。
Input string (intentionally written ugly) is:输入字符串(故意写得难看)是:
Berlin "New York"Madrid 'Frankfurt Am Main' Quebec Łódź München Seattle,Milano
Splitting code is:拆分代码是:
Dim subStrings() As String = Regex.Split(myText, """([^""]*)""|,| ")
Result of this is:这样做的结果是:
0)
1)
2)Berlin
3)
4)New York
5)Madrid
6)'Frankfurt
7)Am
8)Main'
9)Quebec
10)Łódź
11)
12)
13)München
14)Seattle
15)Milano
In short, string should be splitted into array by " " (space) and/or "," char and/or by single or double quote.简而言之,字符串应该被“”(空格)和/或“,”字符和/或单引号或双引号分割成数组。 Quoted terms should be treated as a single word.引用的术语应视为单个单词。 This means that term in single quotes (at place 6) will be treated a same like a term in double quotes.这意味着单引号中的术语(在第 6 位)将被视为与双引号中的术语相同。 That way 'Frankfurt Am Main' at place 6. will be "one word" same as is "New York" at place 4. Also, I would like if regex can be made that empty matches would not go to subStrings() array.这样,第 6 位的“Frankfurt Am Main”将是“一个词”,与第 4 位的“New York”相同。此外,我想如果可以制作正则表达式,空匹配将不会进入 subStrings() 数组。 After all an ideal result from given example should be:毕竟,给定示例的理想结果应该是:
0)Berlin
1)New York
2)Madrid
3)Frankfurt Am Main
4)Quebec
5)Łódź
6)München
7)Seattle
8)Milano
So, please if someone know how to solve this concrete regex for me.所以,请如果有人知道如何为我解决这个具体的正则表达式。
You may extract the strings by using Regex.Matches
with the following regex:您可以使用Regex.Matches
和以下正则表达式来提取字符串:
"([^"]*)"|'([^']*)'|([^,\s]+)
See the regex demo .请参阅正则表达式演示。
Details细节
"([^"]*)"
- "
, then Group 1 matching any 0+ chars other than "
, and then "
"([^"]*)"
- "
,然后第 1 组匹配除"
之外的任何 0+ 个字符,然后"
|
- or - 或者'([^']*)'
- '
, then Group 2 matching any 0+ chars other than '
, and then '
'([^']*)'
- '
,然后第 2 组匹配除'
之外'
任何 0+ 个字符,然后是'
|
- or - 或者([^,\\s]+)
- Group 3: any 1+ chars other than ,
and whitespace ([^,\\s]+)
- 第 3 组:除,
和空格之外的任何 1+ 个字符VB.NET code snippet: VB.NET 代码片段:
Dim text = "Berlin ""New York""Madrid 'Frankfurt Am Main' Quebec Łódź München Seattle,Milano"
Dim pattern As String = """([^""]*)""|'([^']*)'|([^,\s]+)"
Dim matches() As String = Regex.Matches(text, pattern) _
.Cast(Of Match)() _
.Select(Function(m) m.Groups(1).Value & m.Groups(2).Value & m.Groups(3).Value) _
.ToArray()
Results:结果:
The same can be obtained with the following Regex.Split
approach:使用以下Regex.Split
方法可以获得相同的Regex.Split
:
pattern = """([^""]*)""|'([^']*)'|[,\s]+"
Dim matches() As String = Regex.Split(text, pattern).Where(Function(m) Not String.IsNullOrWhiteSpace(m)).ToArray()
See the regex demo .请参阅正则表达式演示。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.