繁体   English   中英

正则表达式将逗号分隔的单词与最后的“and”子句匹配

[英]Regex to match comma-separated words with final "and" clause

我需要一个正则表达式,它与英语语言列表中的单词或短语匹配,采用以下形式之一:

  1. “一些单词”
    将匹配“一些词
  2. “有些话,有些话”
    将匹配“ Some words ”和“ some other words
  3. “一些话,更多话和一些其他话”
    将匹配“ Some words ”、“ more words ”和“ some other words
  4. “一些话,更多话,还有一些其他话”
    将匹配“ Some words ”、“ more words ”和“ some other words

换句话说,正则表达式允许我识别英语短语列表中的每个短语,其中除了最后一个短语(如果有两个以上的短语)之外的所有短语都用逗号分隔,最后的“and”可能会也可能不会以逗号开头。

获取逗号分隔的匹配项很容易:

[^,]+

但我不知道如何处理可选的最终“和”分隔符(没有前面的逗号)。

一种方法是将字符串拆分为and (可选地以逗号开头)或逗号:

string[] inp = new string[] {
    "Some words",
    "Some words and some other words",
    "Some words, more words and some other words",
    "Some words, more words, and some other words" 
};
foreach (string s in inp) {
    string[] phrases = (Regex.Split(s, @"(?:,\s*|\s+)and\s+|,\s*"));
    Console.WriteLine(string.Join("\n", phrases));
}

输出:

Some words
Some words
some other words
Some words
more words
some other words
Some words
more words
some other words

ideone 上的演示

您可以在Regex.Split中使用以下模式:

\s*(?:(?:,\s*)?\band\s+|,\s*)

请参阅正则表达式演示

详情

  • \s* - 零个或多个空格
  • (?:(?:,\s*)?\band\s+|,\s*) - 两种选择之一:
    • (?:,\s*)?\band\s+ - 一个可选的逗号序列和零个或多个空格,然后是一个完整的单词and一个或多个空格字符
    • | - 或者
    • ,\s* - 一个逗号和零个或多个空格。

请参阅 C# 演示:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text.RegularExpressions;

public class Test
{
    public static void Main()
    {
        var texts = new List<string> { 
            "Some words",
            "Some words and some other words",
            "Some words, more words and some other words",
            "Some words, more words, and some other words" 
        };
        var pattern = @"\s*(?:(?:,\s*)?\band\s+|,\s*)";
        foreach (var text in texts) 
        {
            var result = Regex.Split(text, pattern).Where(x => !String.IsNullOrWhiteSpace(x)).ToList();
            Console.WriteLine("'{0}' => ['{1}']", text, string.Join("', '", result));
        }
    }
}

输出:

'Some words' => ['Some words']
'Some words and some other words' => ['Some words', 'some other words']
'Some words, more words and some other words' => ['Some words', 'more words', 'some other words']
'Some words, more words, and some other words' => ['Some words', 'more words', 'some other words']

你可以试试

[一些|一些|更多]+\s(?:[az]+)?\s?words

希望对您有所帮助!

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM