简体   繁体   English

替换正则表达式匹配中特定出现的文本

[英]Replace specific occurrence of text in a Regex Match

EDIT: This example uses html, but I need this type of scenario for working with other types of strings. 编辑:此示例使用html,但我需要这种类型的方案来处理其他类型的字符串。 Please read this as a regex issue, not a html issue. 请将此作为正则表达式问题而不是html问题阅读。

Let's say I have a string like this: 假设我有一个像这样的字符串:

<h1>Hello</h1><h2>World</h2><h3>!</h3>

I may need to replace text to any one of those heading tags, but let's use this example, where I just want to modify <h2> to look like this: 我可能需要将文本替换为这些标题标签中的任何一个,但让我们使用此示例,在此示例中,我只想将<h2>修改为如下所示:

<h1>Hello</h1><div id="h2div"></div><h2>World</h2><h3>!</h3>

Since I may need to replace any of the headings, I only search for <h* using regex. 由于可能需要替换任何标题,因此我仅使用正则表达式搜索<h* Now, I want my code to say "of all the <h* tags you found, only replace the second one". 现在,我希望我的代码说“您发现的所有<h*标签中,仅替换第二个”。

I thought I found the answer here: How do I replace a specific occurrence of a string in a string? 我以为自己在这里找到了答案: 如何替换字符串中特定出现的字符串?

Unfortunately, the results are not what I am looking for. 不幸的是,结果不是我想要的。 Here is my sample code: 这是我的示例代码:

    private void button1_Click(object sender, EventArgs e)
    {
        //sample html file string:
        var htmlText = "<h1>Hello</h1><h2>World</h2><h3>!</h3>";

        //this text should replace <h2 with <div id="h2div"></div><h2"
        var replacementString = "<div id=\"" + "h2div" + "\"" + "</div>" + "<h2";
        int replacementIndex = 1; //only replace the second occurence found by regex.

        //find ALL occurrences of <h1 through <h6 in the file, but only replace <h2.
        htmlText = Regex.Replace(htmlText, "<h([1-6])", m => replacementString + replacementIndex++);

    }

It does not matter whether I specify replacementIndex or replacementIndex++ , which makes sense but I just wanted to match the code as closely as possible to the answer I found. 是否指定replacementIndexreplacementIndex++都没有关系,这很有意义,但我只想将代码与找到的答案尽可能地匹配。

The output looks like this: 输出看起来像这样:

<div id="h2div"></div><h21>Hello</h1><div id="h2div"></div><h22>World</h2><div id="h2div"></div><h23>!</h3>

There are lots of things that should not be happening here. 这里有很多事情不应该发生。 First, only one <div> tag should have been created, rather than three. 首先,应该只创建一个<div>标签,而不是三个。 Second, the <h tag is only replaced instead of <h2 , so now we end up with <h21 , <h22 , and <h23 . 其次,仅替换<h标记而不是<h2 ,所以现在我们以<h21<h22<h23

From a few months ago, I'm getting better at understanding regex matching but I am really unfamiliar with regex matchevaluators and groups; 从几个月前开始,我对regex匹配的理解越来越好,但是我真的不熟悉regex匹配评估器和组。 which I guess is what I probably need here. 我想这可能是我在这里需要的。

Could you recommend how I can fix the code so I can replace a specific index of a regex match? 您能推荐我如何修复代码,以便替换正则表达式匹配项的特定索引吗?

Sorry can not answer in C# but the answer should be very similar. 抱歉,无法使用C#回答,但答案应该非常相似。 For your particular case your regexp attribute for JavaScript String.prototype.replace() is this /(<h1.+?\\/h1>)/ and the replacing attribute is "$1<div id="h2div">" So; 对于您的特殊情况,您的JavaScript String.prototype.replace() regexp属性是/(<h1.+?\\/h1>)/ ,替换属性是"$1<div id="h2div">"

var str = "<h1>Hello</h1><h2>World</h2><h3>!</h3>",
 repStr = str.replace(/(<h1.+?\/h1>)/,'$1<div id="h2div"></div>');

console.log(repStr) // "<h1>Hello</h1><div id="h2div"></div><h2>World</h2><h3>!</h3>"

Or if you don't want to use a capture group you can still do like 或者,如果您不想使用捕获组,您仍然可以喜欢

var repStr = str.replace(/<h1.+?\/h1>/,'$&<div id="h2div"></div>');

which will essentially give the same result in this particular case. 在这种特定情况下,这基本上会产生相同的结果。

using the MatchEvaluator? 使用MatchEvaluator?

private static int count = 0;
    static string CapText(Match m)
    {
        count++;

        if (count == 2)
        {
            return "<div id=\"h2div\"></div>" + m.Value;
        }

        return m.Value;
    }

private void button1_Click()
{
    var htmlText = "<h1>Hello</h1><h2>World</h2><h3>!</h3>";
    Regex rx = new Regex(@"<h([1-6])");
    var result = rx.Replace(htmlText, new MatchEvaluator(ClassOfThis.CapText));
}

I struggled with this for a full day. 我为此奋斗了一整天。 Naturally, asking the question sometimes gets the creative juices flowing, so this is the solution I came up with. 自然地,提出问题有时会激发创意,因此这就是我想出的解决方案。 It uses MatchCollection and then uses a string builder to insert the string. 它使用MatchCollection,然后使用字符串生成器插入字符串。 The string builder might be overkill for this, but it works :-) 字符串生成器对此可能有些过分,但它可以工作:-)

The replacementIndex defines which of the matches you want to insert the text. replaceIndex定义您要插入文本的匹配项。 In my case, the regex finds three instances and modifies the found Index 1. From there, I get the starting string index and use the substring to insert the text. 就我而言,正则表达式找到三个实例并修改找到的索引1。从那里,我得到起始字符串索引,并使用子字符串插入文本。 This is just test code from a button to prove the functionality. 这只是一个按钮的测试代码,以证明其功能。

    private void button1_Click(object sender, EventArgs e)
    {
        //sample text.
        var htmlText = "<h1>Hello</h1><h2>World</h2><h3>!</h3>";

        //the string builder will handle replacing the text.
        var stringBuilder = new StringBuilder(htmlText);

        //build the replacement text.
        var replacementString = "<div id=\"" + "h2div" + "\">" + "</div>";
        int replacementIndex = 1; //only replace the second occurence found by regex (zero-indexed).

        //find ALL occurrences of <h1 through <h6 in the file, but only replace <h2.
        var pattern = "<h([1-6])";
        MatchCollection matches = Regex.Matches(htmlText, pattern); //get all the matches.
        int startIndex = matches[replacementIndex].Index; //get the starting string index for the match.

        //insert the required text just before the found match.
        stringBuilder.Insert(startIndex, replacementString);

        //copy text to clipboard and display it on screen.
        htmlText = stringBuilder.ToString();
        System.Windows.Forms.Clipboard.SetText(htmlText);
        MessageBox.Show(htmlText);
    }

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM