[英]Use C# to reparse a string already containing asterisk character replacements
I received a very helpful response to a previous question raised here.我收到了对此处提出的先前问题的非常有用的答复。
Use C# to surround phrases in a string with asterisk characters from a dictionary of phrases 使用 C# 用短语字典中的星号字符将字符串中的短语括起来
I am now posting a follow-up question for a specific issue.我现在发布针对特定问题的后续问题。
The basic premise for my original query was that I have an array of words and phrases such as the following.我的原始查询的基本前提是我有一组单词和短语,如下所示。
After processing a string of text such as the following.处理了如下所示的字符串后。
"Salt, Water, Wheat Flour, Palm Oil, Nuts, Tree Nuts"
My goal is to have a string that looks as follows (ie the words and phrases from the dictionary are surrounded with asterisk characters, with the longest phrase given priority).我的目标是有一个如下所示的字符串(即字典中的单词和短语用星号字符包围,最长的短语优先)。
"Salt, Water, *Wheat Flour*, Palm Oil, *Nuts*, Tree *Nuts*"
The above is achievable by using the following Regex pattern kindly provided by Dmitry Bychenko.通过使用 Dmitry Bychenko 提供的以下正则表达式模式可以实现上述目的。
string pattern = @"\b(?<!\*)(?:" + string.Join("|", words
.Distinct()
.OrderByDescending(chunk => chunk.Length)
.Select(chunk => Regex.Escape(chunk))) + @")(?!\*)\b";
I have a specific question in regards to when the string I am dealing with has already been processed.我有一个关于何时处理我正在处理的字符串的具体问题。
Imagine I have a string that has already been processed, such as the following.想象一下,我有一个已经被处理过的字符串,如下所示。
"Salt, Water, *Wheat Flour*, Palm Oil, *Nuts*, Tree *Nuts*"
If the array of words I want to replace within the above string now contains a more specific phrase such as "Tree Nuts" is there a Regex expression that can detect that the following phrase should be replaced?如果我想在上述字符串中替换的单词数组现在包含更具体的短语,例如“Tree Nuts”,是否有可以检测以下短语应替换的正则表达式?
"Tree *Nuts*"
ie this section of the string should be updated to the following.即字符串的这一部分应更新为以下内容。
"*Tree Nuts*"
As a quick solution, I suggest implementing two stage replacement.作为快速解决方案,我建议实施两阶段更换。
First, let's remove "erroneous" *
, ie let turn any *word*
into word
:首先,让我们删除“错误的” *
,即让任何*word*
变成word
:
string[] words = new string[] {
"Flour",
"Wheat Flour",
"Nut",
"Nuts",
"Tree Nuts"
};
string removePattern = @"(?:" + string.Join("|", words
.Distinct()
.OrderByDescending(chunk => chunk.Length)
.Select(chunk => $@"\*{Regex.Escape(chunk)}\*")) + @")";
So given text
with *
we can clear it:所以给定带有*
text
,我们可以清除它:
string text = "Salt, Water, *Wheat Flour*, Palm Oil, *Nuts*, Tree *Nuts*";
// unwanted * removed:
// "Salt, Water, Wheat Flour, Palm Oil, Nuts, Tree Nuts"
string cleared = Regex.Replace(text, removePattern, m => m.Value.Trim('*'));
Then (second stage) business as usual:然后(第二阶段)照常营业:
string pattern = @"\b(?<!\*)(?:" + string.Join("|", words
.Distinct()
.OrderByDescending(chunk => chunk.Length)
.Select(chunk => Regex.Escape(chunk))) + @")(?!\*)\b";
string result = Regex.Replace(cleared, pattern, m => "*" + m.Value + "*");
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.