[英]Regex to match comma-separated words with final "and" clause
我需要一个正则表达式,它与英语语言列表中的单词或短语匹配,采用以下形式之一:
换句话说,正则表达式允许我识别英语短语列表中的每个短语,其中除了最后一个短语(如果有两个以上的短语)之外的所有短语都用逗号分隔,最后的“and”可能会也可能不会以逗号开头。
获取逗号分隔的匹配项很容易:
[^,]+
但我不知道如何处理可选的最终“和”分隔符(没有前面的逗号)。
一种方法是将字符串拆分为and
(可选地以逗号开头)或逗号:
string[] inp = new string[] {
"Some words",
"Some words and some other words",
"Some words, more words and some other words",
"Some words, more words, and some other words"
};
foreach (string s in inp) {
string[] phrases = (Regex.Split(s, @"(?:,\s*|\s+)and\s+|,\s*"));
Console.WriteLine(string.Join("\n", phrases));
}
输出:
Some words
Some words
some other words
Some words
more words
some other words
Some words
more words
some other words
您可以在Regex.Split
中使用以下模式:
\s*(?:(?:,\s*)?\band\s+|,\s*)
请参阅正则表达式演示。
详情:
\s*
- 零个或多个空格(?:(?:,\s*)?\band\s+|,\s*)
- 两种选择之一:
(?:,\s*)?\band\s+
- 一个可选的逗号序列和零个或多个空格,然后是一个完整的单词and
一个或多个空格字符|
- 或者,\s*
- 一个逗号和零个或多个空格。请参阅 C# 演示:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text.RegularExpressions;
public class Test
{
public static void Main()
{
var texts = new List<string> {
"Some words",
"Some words and some other words",
"Some words, more words and some other words",
"Some words, more words, and some other words"
};
var pattern = @"\s*(?:(?:,\s*)?\band\s+|,\s*)";
foreach (var text in texts)
{
var result = Regex.Split(text, pattern).Where(x => !String.IsNullOrWhiteSpace(x)).ToList();
Console.WriteLine("'{0}' => ['{1}']", text, string.Join("', '", result));
}
}
}
输出:
'Some words' => ['Some words']
'Some words and some other words' => ['Some words', 'some other words']
'Some words, more words and some other words' => ['Some words', 'more words', 'some other words']
'Some words, more words, and some other words' => ['Some words', 'more words', 'some other words']
你可以试试
[一些|一些|更多]+\s(?:[az]+)?\s?words
希望对您有所帮助!
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.