简体   繁体   English

使用 C# 重新解析已包含星号字符替换的字符串

[英]Use C# to reparse a string already containing asterisk character replacements

I received a very helpful response to a previous question raised here.我收到了对此处提出的先前问题的非常有用的答复。

Use C# to surround phrases in a string with asterisk characters from a dictionary of phrases 使用 C# 用短语字典中的星号字符将字符串中的短语括起来

I am now posting a follow-up question for a specific issue.我现在发布针对特定问题的后续问题。

The basic premise for my original query was that I have an array of words and phrases such as the following.我的原始查询的基本前提是我有一组单词和短语,如下所示。

  • Flour面粉
  • Wheat Flour面粉
  • Nut坚果
  • Nuts坚果

After processing a string of text such as the following.处理了如下所示的字符串后。

"Salt, Water, Wheat Flour, Palm Oil, Nuts, Tree Nuts"

My goal is to have a string that looks as follows (ie the words and phrases from the dictionary are surrounded with asterisk characters, with the longest phrase given priority).我的目标是有一个如下所示的字符串(即字典中的单词和短语用星号字符包围,最长的短语优先)。

"Salt, Water, *Wheat Flour*, Palm Oil, *Nuts*, Tree *Nuts*"

The above is achievable by using the following Regex pattern kindly provided by Dmitry Bychenko.通过使用 Dmitry Bychenko 提供的以下正则表达式模式可以实现上述目的。

  string pattern = @"\b(?<!\*)(?:" + string.Join("|", words
    .Distinct()
    .OrderByDescending(chunk => chunk.Length)
    .Select(chunk => Regex.Escape(chunk))) + @")(?!\*)\b";

I have a specific question in regards to when the string I am dealing with has already been processed.我有一个关于何时处理我正在处理的字符串的具体问题。

Imagine I have a string that has already been processed, such as the following.想象一下,我有一个已经被处理过的字符串,如下所示。

"Salt, Water, *Wheat Flour*, Palm Oil, *Nuts*, Tree *Nuts*"

If the array of words I want to replace within the above string now contains a more specific phrase such as "Tree Nuts" is there a Regex expression that can detect that the following phrase should be replaced?如果我想在上述字符串中替换的单词数组现在包含更具体的短语,例如“Tree Nuts”,是否有可以检测以下短语应替换的正则表达式?

"Tree *Nuts*"

ie this section of the string should be updated to the following.即字符串的这一部分应更新为以下内容。

"*Tree Nuts*"

As a quick solution, I suggest implementing two stage replacement.作为快速解决方案,我建议实施两阶段更换。

First, let's remove "erroneous" * , ie let turn any *word* into word :首先,让我们删除“错误的” * ,即让任何*word*变成word

  string[] words = new string[] {
    "Flour",
    "Wheat Flour",
    "Nut",
    "Nuts",
    "Tree Nuts"
  };

  string removePattern = @"(?:" + string.Join("|", words
    .Distinct()
    .OrderByDescending(chunk => chunk.Length)
    .Select(chunk => $@"\*{Regex.Escape(chunk)}\*")) + @")";

So given text with * we can clear it:所以给定带有* text ,我们可以清除它:

  string text = "Salt, Water, *Wheat Flour*, Palm Oil, *Nuts*, Tree *Nuts*";

  // unwanted * removed: 
  // "Salt, Water, Wheat Flour, Palm Oil, Nuts, Tree Nuts" 
  string cleared = Regex.Replace(text, removePattern, m => m.Value.Trim('*'));

Then (second stage) business as usual:然后(第二阶段)照常营业:

  string pattern = @"\b(?<!\*)(?:" + string.Join("|", words
    .Distinct()
    .OrderByDescending(chunk => chunk.Length)
    .Select(chunk => Regex.Escape(chunk))) + @")(?!\*)\b";

  string result = Regex.Replace(cleared, pattern, m => "*" + m.Value + "*");

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM