簡體   English   中英

正則表達式將逗號分隔的單詞與最后的“and”子句匹配

[英]Regex to match comma-separated words with final "and" clause

我需要一個正則表達式,它與英語語言列表中的單詞或短語匹配,采用以下形式之一:

  1. “一些單詞”
    將匹配“一些詞
  2. “有些話,有些話”
    將匹配“ Some words ”和“ some other words
  3. “一些話,更多話和一些其他話”
    將匹配“ Some words ”、“ more words ”和“ some other words
  4. “一些話,更多話,還有一些其他話”
    將匹配“ Some words ”、“ more words ”和“ some other words

換句話說,正則表達式允許我識別英語短語列表中的每個短語,其中除了最后一個短語(如果有兩個以上的短語)之外的所有短語都用逗號分隔,最后的“and”可能會也可能不會以逗號開頭。

獲取逗號分隔的匹配項很容易:

[^,]+

但我不知道如何處理可選的最終“和”分隔符(沒有前面的逗號)。

一種方法是將字符串拆分為and (可選地以逗號開頭)或逗號:

string[] inp = new string[] {
    "Some words",
    "Some words and some other words",
    "Some words, more words and some other words",
    "Some words, more words, and some other words" 
};
foreach (string s in inp) {
    string[] phrases = (Regex.Split(s, @"(?:,\s*|\s+)and\s+|,\s*"));
    Console.WriteLine(string.Join("\n", phrases));
}

輸出:

Some words
Some words
some other words
Some words
more words
some other words
Some words
more words
some other words

ideone 上的演示

您可以在Regex.Split中使用以下模式:

\s*(?:(?:,\s*)?\band\s+|,\s*)

請參閱正則表達式演示

詳情

  • \s* - 零個或多個空格
  • (?:(?:,\s*)?\band\s+|,\s*) - 兩種選擇之一:
    • (?:,\s*)?\band\s+ - 一個可選的逗號序列和零個或多個空格,然后是一個完整的單詞and一個或多個空格字符
    • | - 或者
    • ,\s* - 一個逗號和零個或多個空格。

請參閱 C# 演示:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text.RegularExpressions;

public class Test
{
    public static void Main()
    {
        var texts = new List<string> { 
            "Some words",
            "Some words and some other words",
            "Some words, more words and some other words",
            "Some words, more words, and some other words" 
        };
        var pattern = @"\s*(?:(?:,\s*)?\band\s+|,\s*)";
        foreach (var text in texts) 
        {
            var result = Regex.Split(text, pattern).Where(x => !String.IsNullOrWhiteSpace(x)).ToList();
            Console.WriteLine("'{0}' => ['{1}']", text, string.Join("', '", result));
        }
    }
}

輸出:

'Some words' => ['Some words']
'Some words and some other words' => ['Some words', 'some other words']
'Some words, more words and some other words' => ['Some words', 'more words', 'some other words']
'Some words, more words, and some other words' => ['Some words', 'more words', 'some other words']

你可以試試

[一些|一些|更多]+\s(?:[az]+)?\s?words

希望對您有所幫助!

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM