如何正則表達式匹配反引號中的每個單詞？

Question

我正在嘗試為反引號中的每個單詞獲取結果。 例如，如果我有類似這樣的文字

一些描述 `match these_words th_is_wor` 或 `THIS_WOR thi_sqw` a `word_snake`

我希望搜索結果是：

匹配
這些字
th_is_wor
THIS_WOR
thi_sqw
word_snake

我基本上試圖在每組反引號之間獲取每個“單詞”，單詞是一個或多個英文字母或下划線字符。

我目前有以下正則表達式，似乎匹配每組反引號之間的所有文本：

/(?<=`)(\b([^`\]|\w|_)*\b)(?=`)/gi

這使用正向向后查找來查找 ` 字符之后的文本： (?<=`)

后跟一個或多個事物的捕獲組，這樣事物不是`，不是\，是單詞字符，或者是單詞邊界內的_字符：(\b([^`\]|\w| _)*\b)

隨后是對另一個 ` 字符的積極前瞻，以確保我們被包含在反引號中。

這種工作，但捕獲反引號之間的所有文本而不是每個單獨的單詞。 這將需要我想避免的進一步處理。 我現在的正則表達式結果是：

匹配 these_words th_is_wor
THIS_WOR thi_sqw
word_snake

如果有一個通用公式可以在反引號或引號內獲取每個單詞，那就太棒了。 謝謝！

注意：如果答案可以用 C# 格式化，非常感謝，但不是必需的，因為如果需要我可以自己做。

編輯：感謝 Ben Awad 的 Discord 服務器的 إين 先生提供最快的響應。 這是他提出的解決方案，也感謝所有回復我帖子的人，你們都很棒！

using System;
using System.Text.RegularExpressions;
class Program {
  static void Main(string[] args) {
    string backtickSentence = "i want to `match these_words th_is_wor` or `THIS_WOR thi_sqw` a `word_snake`";
    string backtickPattern = @"(?<=^[^`]*(?:`[^`]*`[^`]*)*`(?:[^`]* )*)\w+";
    string quoteSentence = "some other \"words in a \" sentence be \"gettin me tripped_up AllUp inHere\"";
    string quotePattern = "(?<=^[^\"]*(?:\"[^\"]*\"[^\"]*)*\"(?:[^\"]* )*)\\w+";
    // Call Matches method without specifying any options.
    try {
      foreach (Match match in Regex.Matches(backtickSentence, backtickPattern, RegexOptions.None, TimeSpan.FromSeconds(1)))
        Console.WriteLine("Found '{0}' at position {1}", match.Value, match.Index);

      Console.WriteLine();
      foreach (Match match in Regex.Matches(quoteSentence, quotePattern, RegexOptions.None, TimeSpan.FromSeconds(1)))
        Console.WriteLine("Found '{0}' at position {1}", match.Value, match.Index);
    }
    catch (RegexMatchTimeoutException) {} // Do Nothing: Assume that timeout represents no match.

    Console.WriteLine();
    // Call Matches method for case-insensitive matching.
    try {
      foreach (Match match in Regex.Matches(backtickSentence, backtickPattern, RegexOptions.IgnoreCase))
        Console.WriteLine("Found '{0}' at position {1}", match.Value, match.Index);

      Console.WriteLine();
      foreach (Match match in Regex.Matches(quoteSentence, quotePattern, RegexOptions.IgnoreCase))
        Console.WriteLine("Found '{0}' at position {1}", match.Value, match.Index);
    }
    catch (RegexMatchTimeoutException) {}
  }
}

他對此的解釋如下，但您可以將他的正則表達式粘貼到 regexr.com 以獲取更多信息

var NOT_BACKTICK = @"[^`]*";
var WORD = @"(\w+)";

var START = $@"^{NOT_BACKTICK}"; // match anything before the first backtick
var INSIDE_BACKTICKS = $@"`{NOT_BACKTICK}`"; // match a pair of backticks
var ODD_NUM_BACKTICKS_BEFORE = $@"{START}({INSIDE_BACKTICKS}{NOT_BACKTICK})*`"; // match anything before the first backtick, then any amount of paired backticks with anything afterwards, then a single opening backtick

var CONDITION = $@"(?<={ODD_NUM_BACKTICKS_BEFORE})";
var CONDITION_TRUE = $@"(?: *{WORD})"; // match any spaces then a word
var CONDITION_FALSE = $@"(?:(?<={ODD_NUM_BACKTICKS_BEFORE}{NOT_BACKTICK} ){WORD})"; // match up to an opening backtick, then anything up to a space before the current word


// uses conditional matching
// see https://learn.microsoft.com/en-us/dotnet/standard/base-types/alternation-constructs-in-regular-expressions#Conditional_Expr
var pattern = $@"(?{CONDITION}{CONDITION_TRUE}|{CONDITION_FALSE})";

// refined backtick pattern
string backtickPattern = @"(?<=^[^`]*(?:`[^`]*`[^`]*)*`(?:[^`]* )*)\w+";

Answer 1

使用 C# 您可以使用Group.Captures 屬性，然后獲取捕獲組值。

注意\w也匹配_

`(?:[\p{Zs}\t]*(\w+)[\p{Zs}\t]*)+`

解釋

<code>字面匹配
(?:非捕獲組作為一個整體重復
- [\p{Zs}\t]*匹配可選空格
- (\w+)捕獲組 1 ，匹配 1+ 個單詞字符
- [\p{Zs}\t]*匹配可選空格
)+關閉非捕獲組並重復至少 1 次或多次
<code>字面匹配

請參閱.NET 正則表達式演示和C# 演示。

例如：

string s = @"some description ` match these_words th_is_wor ` or `THIS_WOR thi_sqw` a `word_snake`";
string pattern = @"`(?:[\p{Zs}\t]*(\w+)[\p{Zs}\t]*)+`";
foreach (Match m in Regex.Matches(s, pattern))
{
    string[] result = m.Groups[1].Captures.Select(c => c.Value).ToArray();
    Console.WriteLine(String.Join(',', result));
}

Output

match,these_words,th_is_wor
THIS_WOR,thi_sqw
word_snake

Answer 2

對於將匹配項鏈接到后跟單詞邊界的反引號，可以使用\G錨：

 (?:\G(?!^)[^\w`]+|`\b)(\w+)

`\b設置鏈的起點
\G(?!^)[^\w`]+在前一個匹配結束的地方繼續（否定前瞻阻止\G在開始時匹配）並使用不是單詞字符或反引號的字符
(\w+)每個單詞都被捕獲到第一組（ .NET 演示）

在 regex101 或 .NET 變體中查看此演示，而不在 regexstorm 捕獲組
此模式不檢查第二個反引號（需要另一個前瞻）。

如何正則表達式匹配反引號中的每個單詞？

問題描述

1 個解決方案

解決方案1
1 已采納 2022-09-18 09:28:04

解決方案2
1 2022-09-18 10:11:48

如何正則表達式匹配反引號中的每個單詞？

問題描述

1 個解決方案

解決方案1 1 已采納 2022-09-18 09:28:04

解決方案2 1 2022-09-18 10:11:48

解決方案1
1 已采納 2022-09-18 09:28:04

解決方案2
1 2022-09-18 10:11:48