简体   繁体   中英

Regular Expression - How To Find Words and Quoted Phrases

I am wanting to take a string of say the following:

Guiness Harp "Holy Moses"

So that in C# or VB get a match set of:

Guiness
Harp
Holy Moses

Essentially it splits on the spaces unless there are quotes around the spaces, then those words between quotes are considered a single phrase.

Thanks, Kevin

If you don't have any (escaped or doubled) quotes inside your quoted strings, you could search for

 "[^"]*"|\S+

However, the quotes will be part of the match. The regex can be extended to also handle quotes inside quoted strings if necessary.

Another (and in this case preferable) possibility would be to use a csv parser.

For example (Python):

import csv
reader = csv.reader(open('test.txt'), delimiter=' ', quotechar='"')
for row in reader:
    print(row)

Here's another approach:

string s0 = @"Guiness Harp ""Holy Moses""";
Regex r = new Regex(@"""(?<FIELD>[^""]*)""|(?<FIELD>\S+)");
foreach (Match m in r.Matches(s0))
{
  Console.WriteLine(m.Groups["FIELD"].Value);
}

This takes advantage of the fact that .NET regexes let you reuse group names within the same regex. Very few regex flavors allow that, and of those only Perl 6 is as flexible about it as .NET.

Regular expressions can't count, which makes delimiter parsing difficult.

I would use a parser rather than regular expressions for this.

If this is a simple parsing you may be able to trim the starting and ending quotes.

string text = "Guiness Harp \"Holy Moses\"";
string pattern = @"""[^""]*""|\S+";

MatchCollection matches = Regex.Matches( text, pattern );
foreach( Match match in matches )
{
    string value = match.Value.Trim( '"' );
    Console.Out.WriteLine( value );
}

However, this implementation isn't very flexible. I'd only use something like this in an internal tool. Or you don't mind throwing away your code.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM