简体   繁体   English

vb.net 使用正则表达式拆分字符串

[英]vb.net Split string with regex

I have requrement to split a string with regex on several rules and I do something with help of previous posts here but I don't know how to do it completelly.我需要在几个规则上用正则表达式分割一个字符串,我在以前的帖子的帮助下做了一些事情,但我不知道如何完全做到这一点。

Input string (intentionally written ugly) is:输入字符串(故意写得难看)是:

Berlin "New York"Madrid 'Frankfurt Am Main' Quebec Łódź   München Seattle,Milano

Splitting code is:拆分代码是:

Dim subStrings() As String = Regex.Split(myText, """([^""]*)""|,| ")

Result of this is:这样做的结果是:

0)  
1)  
2)Berlin  
3)  
4)New York  
5)Madrid  
6)'Frankfurt  
7)Am  
8)Main'  
9)Quebec  
10)Łódź  
11)  
12)  
13)München  
14)Seattle  
15)Milano  

In short, string should be splitted into array by " " (space) and/or "," char and/or by single or double quote.简而言之,字符串应该被“”(空格)和/或“,”字符和/或单引号或双引号分割成数组。 Quoted terms should be treated as a single word.引用的术语应视为单个单词。 This means that term in single quotes (at place 6) will be treated a same like a term in double quotes.这意味着单引号中的术语(在第 6 位)将被视为与双引号中的术语相同。 That way 'Frankfurt Am Main' at place 6. will be "one word" same as is "New York" at place 4. Also, I would like if regex can be made that empty matches would not go to subStrings() array.这样,第 6 位的“Frankfurt Am Main”将是“一个词”,与第 4 位的“New York”相同。此外,我想如果可以制作正则表达式,空匹配将不会进入 subStrings() 数组。 After all an ideal result from given example should be:毕竟,给定示例的理想结果应该是:

0)Berlin  
1)New York  
2)Madrid  
3)Frankfurt Am Main  
4)Quebec  
5)Łódź  
6)München  
7)Seattle  
8)Milano  

So, please if someone know how to solve this concrete regex for me.所以,请如果有人知道如何为我解决这个具体的正则表达式。

You may extract the strings by using Regex.Matches with the following regex:您可以使用Regex.Matches和以下正则表达式来提取字符串:

"([^"]*)"|'([^']*)'|([^,\s]+)

See the regex demo .请参阅正则表达式演示

Details细节

  • "([^"]*)" - " , then Group 1 matching any 0+ chars other than " , and then " "([^"]*)" - " ,然后第 1 组匹配除"之外的任何 0+ 个字符,然后"
  • | - or - 或者
  • '([^']*)' - ' , then Group 2 matching any 0+ chars other than ' , and then ' '([^']*)' - ' ,然后第 2 组匹配除'之外'任何 0+ 个字符,然后是'
  • | - or - 或者
  • ([^,\\s]+) - Group 3: any 1+ chars other than , and whitespace ([^,\\s]+) - 第 3 组:除,和空格之外的任何 1+ 个字符

VB.NET code snippet: VB.NET 代码片段:

Dim text = "Berlin ""New York""Madrid 'Frankfurt Am Main' Quebec Łódź   München Seattle,Milano"
Dim pattern As String = """([^""]*)""|'([^']*)'|([^,\s]+)"
Dim matches() As String = Regex.Matches(text, pattern) _
          .Cast(Of Match)() _
          .Select(Function(m) m.Groups(1).Value & m.Groups(2).Value & m.Groups(3).Value) _
          .ToArray()

Results:结果:

在此处输入图片说明

The same can be obtained with the following Regex.Split approach:使用以下Regex.Split方法可以获得相同的Regex.Split

pattern = """([^""]*)""|'([^']*)'|[,\s]+"
Dim matches() As String = Regex.Split(text, pattern).Where(Function(m) Not String.IsNullOrWhiteSpace(m)).ToArray()

See the regex demo .请参阅正则表达式演示

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM