简体   繁体   English

正则表达式匹配-不能按预期工作

[英]RegEx match - not working as expected

I have a string of text/html. 我有一串text / html。 I want to replace parts of the string, if it has a match, based on my RegEx pattern. 我想根据我的RegEx模式替换字符串的一部分(如果有匹配项)。 The pattern check for href=".." containing a 32 character long GUID. href=".."的模式检查包含32个字符长的GUID。 If it finds it, I then want to replace it. 如果找到它,那么我想替换它。

My pattern works here: https://regex101.com/r/IWW7bW/1 我的模式在这里有效: https : //regex101.com/r/IWW7bW/1

But, when I implement the same pattern in my C# project, it does not find a match with the same text from my DB. 但是,当我在C#项目中实现相同的模式时,它找不到与数据库中相同文本的匹配项。

public static string UpdateLinks(string bodyText) {
    string patternLinks = @"((\/~\/link\.aspx\?_id=([A-Z0-9]{32})))";
    bodyText = Regex.Replace(bodyText, patternLinks, "/$3/mylink.aspx");

    return bodyText;
}

If I take the raw text string like @"<a href="/~/link.aspx?_id=994FE46E00D84DE9BF8050948E5496DA&amp;_z=z">" , and hardcode that into bodyText, it DOES find a match. 如果我将原始文本字符串(例如@"<a href="/~/link.aspx?_id=994FE46E00D84DE9BF8050948E5496DA&amp;_z=z">"硬编码为bodyText),则会找到匹配项。 But the excat same value, is part of the string comming from the database, and it does not get matched. 但是excat值相同,它是来自数据库的字符串的一部分,并且不匹配。 So, what is going on? 那么发生了什么? Some sort of encoding inbetween, or? 之间是某种编码,还是?

Example string from the DB 数据库中的示例字符串

<p><a href="/~/link.aspx?_id=994FE46E00D84DE9BF8050948E5496DA&amp;_z=z">Link 1</a> and <a href="/~/link.aspx?_id=E7BBDF47B8784AA084985A0623490295&amp;_z=z">Link 2</a></p>

Expected output, based on the above string 预期输出,基于上述字符串

<p><a href="/994FE46E00D84DE9BF8050948E5496DA/mylink.aspx">Link 1</a> and <a href="/E7BBDF47B8784AA084985A0623490295/mylink.aspx">Link 2</a></p>

Use this pattern: 使用此模式:

string patternLinks = @"((\/~\/link\.aspx\?_id=([A-Z0-9]{32})[^""]+))";

Result : 结果:

<p><a href="/994FE46E00D84DE9BF8050948E5496DA/mylink.aspx">Link 1</a> and <a href="/E7BBDF47B8784AA084985A0623490295/mylink.aspx">Link 2</a></p>>

我认为您只是忘记在您的模式中添加此部分- &amp;_z=z

 var patternLinks = @"((\/~\/link\.aspx\?_id=([A-Z0-9]{32})&amp;_z=z))";

You are testing your regular expression using a PHP parser. 您正在使用PHP解析器测试您的正则表达式。 You should use something like http://regexstorm.net/tester . 您应该使用类似http://regexstorm.net/tester的名称 There you will see that it's a grouping issue. 在那里,您将看到这是一个分组问题。 This expression worked for me there. 这种表达在那里对我有用。

((\/~\/link\.aspx\?_id=)([A-Z0-9]{32}))

Try following regex. 尝试遵循正则表达式。

(?<=href="\\/).*?=(.*?)&.*?"

var src = <your sample string>

try {
    var result = Regex.Replace(src, 
        @"(?<=href=""\/).*?=(.*?)&.*?""", "$1/mylink.aspx\"", 
        RegexOptions.Singleline);
    Console.WriteLine(result);
} catch (ArgumentException ex) {
    // Syntax error in the regular expression
}

This should print: 这应该打印:

<p><a href="/994FE46E00D84DE9BF8050948E5496DA/mylink.aspx">Link 1</a> and <a href="/E7BBDF47B8784AA084985A0623490295/mylink.aspx">Link 2</a></p>

Please see https://regex101.com/r/gruKQP/1/ for demonstration 请参阅https://regex101.com/r/gruKQP/1/进行演示

You have way too many brackets in your regex, which give you extra capture groups you don't need. 您的正则表达式中有太多的括号,这给了您不必要的额外捕获组。 Just leave them off. 只是让他们离开。 And if you want to trim away the stuff after the 32-character ID, you need to include it in your pattern but not inside the capture group. 而且,如果您想在32个字符的ID之后删除内容,则需要将其包括在模式中,而不要包含在捕获组中。 The simplest way to exclude anything following the 32-character ID is to simply match anything following it that is not the closing quote, so, [^"]* . 排除32个字符的ID之后的任何内容的最简单方法是简单地匹配其后的不是结束引号的任何内容,因此[^"]*

The regex should be this: 正则表达式应为:

@"\/~\/link\.aspx\?_id=([A-Z0-9]{32})[^""]*"

And with the removal of these extra useless brackets around your match, the replacement will simply use the first group: 并去除了比赛周围这些多余的无用括号,替换将仅使用第一组:

"/$1/mylink.aspx"

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM