简体   繁体   中英

Regex to match comma-separated words with final "and" clause

I need a Regex expression that matches the words or phrases in an English language list of things, in one of these forms:

  1. "Some words"
    would match " Some words "
  2. "Some words and some other words"
    would match " Some words " and " some other words "
  3. "Some words, more words and some other words"
    would match " Some words ", " more words ", and " some other words "
  4. "Some words, more words, and some other words"
    would match " Some words ", " more words ", and " some other words "

In other words, the Regex allows me to identify each phrase in an English language list of phrases, where all but the final phrase (if there are more than two phrases) are separated by commas, and the final "and" may or may not be preceded by a comma.

Getting the comma-separated matches is easy:

[^,]+

but I can't figure out how to deal with the optional final "and" separator (without a preceding comma).

One way to do this is to split the string on and (optionally preceded by a comma) or comma:

string[] inp = new string[] {
    "Some words",
    "Some words and some other words",
    "Some words, more words and some other words",
    "Some words, more words, and some other words" 
};
foreach (string s in inp) {
    string[] phrases = (Regex.Split(s, @"(?:,\s*|\s+)and\s+|,\s*"));
    Console.WriteLine(string.Join("\n", phrases));
}

Output:

Some words
Some words
some other words
Some words
more words
some other words
Some words
more words
some other words

Demo on ideone

You can use the following pattern in Regex.Split :

\s*(?:(?:,\s*)?\band\s+|,\s*)

See the regex demo .

Details :

  • \s* - zero or more whitespaces
  • (?:(?:,\s*)?\band\s+|,\s*) - one of the two alternatives:
    • (?:,\s*)?\band\s+ - an optional sequence of a comma and zero or more whitespaces and then a whole word and with one or more whitespace chars right after
    • | - or
    • ,\s* - a comma and zero or more whitespaces.

See the C# demo:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text.RegularExpressions;

public class Test
{
    public static void Main()
    {
        var texts = new List<string> { 
            "Some words",
            "Some words and some other words",
            "Some words, more words and some other words",
            "Some words, more words, and some other words" 
        };
        var pattern = @"\s*(?:(?:,\s*)?\band\s+|,\s*)";
        foreach (var text in texts) 
        {
            var result = Regex.Split(text, pattern).Where(x => !String.IsNullOrWhiteSpace(x)).ToList();
            Console.WriteLine("'{0}' => ['{1}']", text, string.Join("', '", result));
        }
    }
}

Output:

'Some words' => ['Some words']
'Some words and some other words' => ['Some words', 'some other words']
'Some words, more words and some other words' => ['Some words', 'more words', 'some other words']
'Some words, more words, and some other words' => ['Some words', 'more words', 'some other words']

You can try it

[some|Some|more]+\s(?:[az]+)?\s?words

Hope it will help you!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM