[英]Replace specific occurrence of text in a Regex Match
EDIT: This example uses html, but I need this type of scenario for working with other types of strings. 编辑:此示例使用html,但我需要这种类型的方案来处理其他类型的字符串。 Please read this as a regex issue, not a html issue.
请将此作为正则表达式问题而不是html问题阅读。
Let's say I have a string like this: 假设我有一个像这样的字符串:
<h1>Hello</h1><h2>World</h2><h3>!</h3>
I may need to replace text to any one of those heading tags, but let's use this example, where I just want to modify <h2>
to look like this: 我可能需要将文本替换为这些标题标签中的任何一个,但让我们使用此示例,在此示例中,我只想将
<h2>
修改为如下所示:
<h1>Hello</h1><div id="h2div"></div><h2>World</h2><h3>!</h3>
Since I may need to replace any of the headings, I only search for <h*
using regex. 由于可能需要替换任何标题,因此我仅使用正则表达式搜索
<h*
。 Now, I want my code to say "of all the <h*
tags you found, only replace the second one". 现在,我希望我的代码说“您发现的所有
<h*
标签中,仅替换第二个”。
I thought I found the answer here: How do I replace a specific occurrence of a string in a string? 我以为自己在这里找到了答案: 如何替换字符串中特定出现的字符串?
Unfortunately, the results are not what I am looking for. 不幸的是,结果不是我想要的。 Here is my sample code:
这是我的示例代码:
private void button1_Click(object sender, EventArgs e)
{
//sample html file string:
var htmlText = "<h1>Hello</h1><h2>World</h2><h3>!</h3>";
//this text should replace <h2 with <div id="h2div"></div><h2"
var replacementString = "<div id=\"" + "h2div" + "\"" + "</div>" + "<h2";
int replacementIndex = 1; //only replace the second occurence found by regex.
//find ALL occurrences of <h1 through <h6 in the file, but only replace <h2.
htmlText = Regex.Replace(htmlText, "<h([1-6])", m => replacementString + replacementIndex++);
}
It does not matter whether I specify replacementIndex
or replacementIndex++
, which makes sense but I just wanted to match the code as closely as possible to the answer I found. 是否指定
replacementIndex
或replacementIndex++
都没有关系,这很有意义,但我只想将代码与找到的答案尽可能地匹配。
The output looks like this: 输出看起来像这样:
<div id="h2div"></div><h21>Hello</h1><div id="h2div"></div><h22>World</h2><div id="h2div"></div><h23>!</h3>
There are lots of things that should not be happening here. 这里有很多事情不应该发生。 First, only one
<div>
tag should have been created, rather than three. 首先,应该只创建一个
<div>
标签,而不是三个。 Second, the <h
tag is only replaced instead of <h2
, so now we end up with <h21
, <h22
, and <h23
. 其次,仅替换
<h
标记而不是<h2
,所以现在我们以<h21
, <h22
和<h23
。
From a few months ago, I'm getting better at understanding regex matching but I am really unfamiliar with regex matchevaluators and groups; 从几个月前开始,我对regex匹配的理解越来越好,但是我真的不熟悉regex匹配评估器和组。 which I guess is what I probably need here.
我想这可能是我在这里需要的。
Could you recommend how I can fix the code so I can replace a specific index of a regex match? 您能推荐我如何修复代码,以便替换正则表达式匹配项的特定索引吗?
Sorry can not answer in C# but the answer should be very similar. 抱歉,无法使用C#回答,但答案应该非常相似。 For your particular case your regexp attribute for JavaScript
String.prototype.replace()
is this /(<h1.+?\\/h1>)/
and the replacing attribute is "$1<div id="h2div">"
So; 对于您的特殊情况,您的JavaScript
String.prototype.replace()
regexp属性是/(<h1.+?\\/h1>)/
,替换属性是"$1<div id="h2div">"
。
var str = "<h1>Hello</h1><h2>World</h2><h3>!</h3>",
repStr = str.replace(/(<h1.+?\/h1>)/,'$1<div id="h2div"></div>');
console.log(repStr) // "<h1>Hello</h1><div id="h2div"></div><h2>World</h2><h3>!</h3>"
Or if you don't want to use a capture group you can still do like 或者,如果您不想使用捕获组,您仍然可以喜欢
var repStr = str.replace(/<h1.+?\/h1>/,'$&<div id="h2div"></div>');
which will essentially give the same result in this particular case. 在这种特定情况下,这基本上会产生相同的结果。
using the MatchEvaluator? 使用MatchEvaluator?
private static int count = 0;
static string CapText(Match m)
{
count++;
if (count == 2)
{
return "<div id=\"h2div\"></div>" + m.Value;
}
return m.Value;
}
private void button1_Click()
{
var htmlText = "<h1>Hello</h1><h2>World</h2><h3>!</h3>";
Regex rx = new Regex(@"<h([1-6])");
var result = rx.Replace(htmlText, new MatchEvaluator(ClassOfThis.CapText));
}
I struggled with this for a full day. 我为此奋斗了一整天。 Naturally, asking the question sometimes gets the creative juices flowing, so this is the solution I came up with.
自然地,提出问题有时会激发创意,因此这就是我想出的解决方案。 It uses MatchCollection and then uses a string builder to insert the string.
它使用MatchCollection,然后使用字符串生成器插入字符串。 The string builder might be overkill for this, but it works :-)
字符串生成器对此可能有些过分,但它可以工作:-)
The replacementIndex defines which of the matches you want to insert the text. replaceIndex定义您要插入文本的匹配项。 In my case, the regex finds three instances and modifies the found Index 1. From there, I get the starting string index and use the substring to insert the text.
就我而言,正则表达式找到三个实例并修改找到的索引1。从那里,我得到起始字符串索引,并使用子字符串插入文本。 This is just test code from a button to prove the functionality.
这只是一个按钮的测试代码,以证明其功能。
private void button1_Click(object sender, EventArgs e)
{
//sample text.
var htmlText = "<h1>Hello</h1><h2>World</h2><h3>!</h3>";
//the string builder will handle replacing the text.
var stringBuilder = new StringBuilder(htmlText);
//build the replacement text.
var replacementString = "<div id=\"" + "h2div" + "\">" + "</div>";
int replacementIndex = 1; //only replace the second occurence found by regex (zero-indexed).
//find ALL occurrences of <h1 through <h6 in the file, but only replace <h2.
var pattern = "<h([1-6])";
MatchCollection matches = Regex.Matches(htmlText, pattern); //get all the matches.
int startIndex = matches[replacementIndex].Index; //get the starting string index for the match.
//insert the required text just before the found match.
stringBuilder.Insert(startIndex, replacementString);
//copy text to clipboard and display it on screen.
htmlText = stringBuilder.ToString();
System.Windows.Forms.Clipboard.SetText(htmlText);
MessageBox.Show(htmlText);
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.