简体   繁体   English

正则表达式从字符串中替换第n个元素并添加自定义HTML标记

[英]Regex replace the nth element from string and adding custom HTML Tags

I have a list of words "this", "be able", "it" that I want to find inside a paragraph so I can replace preserving their capitalization. 我有一个单词列表"this", "be able", "it" ,我想在段落中找到,所以我可以替换保留它们的大小写。

Having this paragraph: 有这个段落:

This is my text and this is why I want to match it! 这是我的文字,这就是为什么我要匹配它! As this is just a text, I would like to be able to solve it. 由于这只是一个文本,我希望能够解决它。 This is the final phrase of this paragraph. 这是本段的最后一句。

"this" is found 5 times and if I decide to replace the 4th one ( "This" ) I want to still be able to keep the T capital. "this"被发现5次,如果我决定更换第4个( "This" )我想仍然能够保留T资本。 Now you will see that's not actually a replace but more of an adding problem as the actual replace would be from this to This 现在,你会看到,实际上并不是一个替代,但更加法问题,因为实际的替代将是从thisThis

so my final paragraph would be: 所以我的最后一段是:

This is my text and this is why I want to match it! 这是我的文字,这就是为什么我要匹配它! As this is just a text, I would like to be able to solve it. 由于这只是一个文本,我希望能够解决它。 This is the final phrase of this paragraph. 是本段的最后一句。

My code so far: 我的代码到目前为止:

    List<string> words = new List<string>(new string[] { "this", "be able", "it"});
    var paragraph = "This is my text and this is why I want to match it! As this is just a text, I would like to be able to solve it. This is the final phrase of this paragraph.";
    //List<string> 
    for (int w = 0; w < words.Count; w++)
    {
        var foudItems = Regex.Matches(paragraph, @"\b" + words[w] + "\\b", RegexOptions.IgnoreCase);

        if (foudItems.Count != 0)
        {
            Random rnd = new Random();
            int rndWord = rnd.Next(0, foudItems.Count);
            Regex.Replace(paragraph, @"\b" + words[w] + "\\b", "<strong>" + foudItems[rndWord] + "</strong>");
            Console.WriteLine(paragraph);
        }

        //Regex.Replace()
        Console.WriteLine(foudItems[0] + " " + foudItems[1]);
}

The main problem is that I don't know how to replace only the n'th word using regex. 主要问题是我不知道如何使用正则表达式替换第n个单词。 Another issue would be the complicated approach in solving this so I'm open to new suggestions. 另一个问题是解决这个问题的复杂方法,所以我愿意接受新的建议。

If you want to replace nth occurrence of something, you can use MatchEvaluator delegate which checks current occurrence index and returns unmodified matched value if index match is not one you want to replace. 如果要替换第n个出现的内容,可以使用MatchEvaluator委托检查当前出现的索引,如果索引匹配不是您要替换的匹配值,则返回未修改的匹配值。 To track current index you can capture local variable: 要跟踪当前索引,您可以捕获局部变量:

int occurrenceToReplace = 4;
int index = 0;
MatchEvaluator evaluator = m => (++index == occurrenceToReplace)
    ? $"<strong>{m.Value}</strong>"
    : m.Value;

text = Regex.Replace(text, @"\bthis\b", evaluator, RegexOptions.IgnoreCase);

Now back to your problem - you can write method which wraps nth occurrence of given word into html tag: 现在回到你的问题 - 你可以写出将第n次出现的给定单词包装到html标签中的方法:

private static string MakeStrong(string text, string word, int occurrence)
{
    int index = 0;
    MatchEvaluator evaluator = m => (++index == occurrence)
         ? $"<strong>{m.Value}</strong>"
         : m.Value;
    return Regex.Replace(text, $@"\b{word}\b", evaluator, RegexOptions.IgnoreCase);
}

And if you want to randomly replace one of the occurrences of each word, then just use this method in a loop: 如果你想随机替换每个单词的一个出现,那么只需在循环中使用此方法:

string[] words = { "this", "be able", "it"};   
var paragraph = @"This is my text and this is why I want to match it! As this is just
a text, I would like to be able to solve it. This is the final phrase of this paragraph.";

var random = new Random();
foreach(var word in words)
{
    int count = Regex.Matches(paragraph, $@"\b{word}\b", RegexOptions.IgnoreCase).Count;
    int occurrence = random.Next(count + 1);
    paragraph = MakeStrong(paragraph, word, occurrence);
}

Sample output: 样本输出:

This is my text and this is why I want to match it ! 这是我的文字, 就是为什么我要匹配 As this is just a text, I would like to be able to solve it. 由于这只是一个文本,我希望能够解决它。 This is the final phrase of this paragraph. 这是本段的最后一句。

If you want to keep the regex side quite simple, you can use this algo: 如果你想保持正则表达式方面非常简单,你可以使用这个算法:

List<string> words = new List<string>(new string[] { "this", "be able", "it" });
var paragraph = "This is my text and this is why I want to match it! As this is just a text, I would like to be able to solve it. This is the final phrase of this paragraph.";
//List<string> 
foreach (string word in words)
{
    var foundItems = Regex.Matches(paragraph, @"\b" + word + @"\b", RegexOptions.IgnoreCase);
    if (foundItems.Count != 0)
    {
        var count = 0;
        var toReplace = 3;
        foreach (Match foudItem in foundItems)
        {
            count++;
            if(count != toReplace)
                continue;

            var regex = $"(^.{{{foudItem.Index}}}){foudItem.Value}(.*)";
            paragraph = Regex.Replace(paragraph, regex, $"$1<strong>{foudItem.Value}</strong>$2");
        }
        Console.WriteLine(paragraph);
    }
    Console.WriteLine(foundItems[0] + " " + foundItems[1]);
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM