简体   繁体   English

C#正则表达式查找并替换重用部分匹配文本

[英]c# regex to find and replace reusing part of the matched text

I need to do a search and replace on long text strings. 我需要搜索并替换长文本字符串。 I want to find all instances of broken links that look like this: 我想查找断开链接的所有实例,如下所示:

<a href="http://any.url.here/%7BlocalLink:1369%7D%7C%7CThank%20you%20for%20registering">broken link</a>

and fix it so that it looks like this: 并修复它,使其看起来像这样:

<a href="/{localLink:1369}" title="Thank you for registering">link</a>

There may be a number of these broken links in the text field. 文本字段中可能有许多此类断开的链接。 My difficulty is working out how to reuse the matched ID (in this case 1369). 我的难题是弄清楚如何重用匹配的ID(在本例中为1369)。 In the content this ID changes from link to link, as does the url and the link text. 在内容中,此ID在链接之间更改,URL和链接文本也一样。

Thanks, 谢谢,

David 大卫

EDIT: To clarify, I am writing C# code to run through hundreds of long text fields to fix broken links in them. 编辑:澄清一下,我正在编写C#代码来运行数百个长文本字段,以修复其中的损坏链接。 Each single text field contains html that can have any number of broken links in there - the regex needs to find them all and replace them with the correct version of the link. 每个单个文本字段都包含html,该html中可以有任意数量的断开链接-正则表达式需要全部找到它们并将其替换为链接的正确版本。

I'm assuming that you already have the element and the attributes parsed. 我假设您已经对元素和属性进行了解析。 So to process the URL, use something like this: 因此,要处理该URL,请使用以下内容:

    string url = "http://any.url.here/%7BlocalLink:1369%7D%7C%7CThank%20you%20for%20registering";
    Match match = Regex.Match(HttpUtility.UrlDecode(url), @"^http://[^/]+/\{(?<local>[^:]+):(?<id>\d+)\}\|\|(?<title>.*)$");
    if (match.Success) {
        Console.WriteLine(match.Groups["local"].Value);
        Console.WriteLine(match.Groups["id"].Value);
        Console.WriteLine(match.Groups["title"].Value);
    } else {
        Console.WriteLine("Not one of those URLs");
    }

To include the match in the replacement string, you use $& . 要在替换字符串中包含匹配项,请使用$&

There are a number of other substitution markers that can be used in the replacement string, see here for the list . 替换字符串中可以使用许多其他替换标记, 有关列表请参见此处

Take this with a grain of salt, HTML and Regex don't play well together: 细想一下,HTML和Regex不能很好地配合使用:

(<a\s+[^>]*href=")[^"%]*%7B(localLink:\d+)%7D%7C%7C([^"]*)("[^>]*>[^<]*</a>)

When applied to your input and replaced with 当应用于您的输入并替换为

$1/{$2}" title="$3$4

the following is produced: 产生以下内容:

<a href="/{localLink:1369}" title="Thank%20you%20for%20registering">broken link</a>

This is as close as it gets with regex alone. 这与仅使用正则表达式的情况就差不多。 You'll need to use a MatchEvaluator delegate to remove the URL encoding from the replacement. 您需要使用MatchEvaluator委托从替换中删除URL编码。

Thanks to everyone for their help. 感谢大家的帮助。 Here is what I used in the end: 这是我最后使用的内容:

const string pattern = @"(<a\s+[^>""]*href="")[^""]+(localLink:\d+)(?:%7[DC])*([^""]+)(""[^>]*>[^<]*</a>)";
// Create a match evaluator to replace the matched links with the correct markup
var myEvaluator = new MatchEvaluator(FixLink);

var strNewText = Regex.Replace(strText, pattern, myEvaluator, RegexOptions.IgnoreCase);

internal static string FixLink(Match m)
    {
        var strUrl = m.ToString();
        const string namedPattern = @"(<a\s+[^>""]*href="")[^""]+(localLink:\d+)(?:%7[DC])*([^""]+)(""[^>]*>[^<]*</a>)";
        var regex = new Regex(namedPattern);

        //const string strReplace = @"$1/{$2}"" title=""$4";
        const string strReplace = @"$1/{$2}"" title=""$4";

        HttpContext.Current.Response.Write(String.Format("Replacing '{0}' with '{1}'", strUrl, regex.Replace(strUrl, strReplace)));
        return regex.Replace(strUrl, strReplace);
    }

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM