简体   繁体   中英

RegEx match - not working as expected

I have a string of text/html. I want to replace parts of the string, if it has a match, based on my RegEx pattern. The pattern check for href=".." containing a 32 character long GUID. If it finds it, I then want to replace it.

My pattern works here: https://regex101.com/r/IWW7bW/1

But, when I implement the same pattern in my C# project, it does not find a match with the same text from my DB.

public static string UpdateLinks(string bodyText) {
    string patternLinks = @"((\/~\/link\.aspx\?_id=([A-Z0-9]{32})))";
    bodyText = Regex.Replace(bodyText, patternLinks, "/$3/mylink.aspx");

    return bodyText;
}

If I take the raw text string like @"<a href="/~/link.aspx?_id=994FE46E00D84DE9BF8050948E5496DA&amp;_z=z">" , and hardcode that into bodyText, it DOES find a match. But the excat same value, is part of the string comming from the database, and it does not get matched. So, what is going on? Some sort of encoding inbetween, or?

Example string from the DB

<p><a href="/~/link.aspx?_id=994FE46E00D84DE9BF8050948E5496DA&amp;_z=z">Link 1</a> and <a href="/~/link.aspx?_id=E7BBDF47B8784AA084985A0623490295&amp;_z=z">Link 2</a></p>

Expected output, based on the above string

<p><a href="/994FE46E00D84DE9BF8050948E5496DA/mylink.aspx">Link 1</a> and <a href="/E7BBDF47B8784AA084985A0623490295/mylink.aspx">Link 2</a></p>

Use this pattern:

string patternLinks = @"((\/~\/link\.aspx\?_id=([A-Z0-9]{32})[^""]+))";

Result :

<p><a href="/994FE46E00D84DE9BF8050948E5496DA/mylink.aspx">Link 1</a> and <a href="/E7BBDF47B8784AA084985A0623490295/mylink.aspx">Link 2</a></p>>

我认为您只是忘记在您的模式中添加此部分- &amp;_z=z

 var patternLinks = @"((\/~\/link\.aspx\?_id=([A-Z0-9]{32})&amp;_z=z))";

You are testing your regular expression using a PHP parser. You should use something like http://regexstorm.net/tester . There you will see that it's a grouping issue. This expression worked for me there.

((\/~\/link\.aspx\?_id=)([A-Z0-9]{32}))

Try following regex.

(?<=href="\\/).*?=(.*?)&.*?"

var src = <your sample string>

try {
    var result = Regex.Replace(src, 
        @"(?<=href=""\/).*?=(.*?)&.*?""", "$1/mylink.aspx\"", 
        RegexOptions.Singleline);
    Console.WriteLine(result);
} catch (ArgumentException ex) {
    // Syntax error in the regular expression
}

This should print:

<p><a href="/994FE46E00D84DE9BF8050948E5496DA/mylink.aspx">Link 1</a> and <a href="/E7BBDF47B8784AA084985A0623490295/mylink.aspx">Link 2</a></p>

Please see https://regex101.com/r/gruKQP/1/ for demonstration

You have way too many brackets in your regex, which give you extra capture groups you don't need. Just leave them off. And if you want to trim away the stuff after the 32-character ID, you need to include it in your pattern but not inside the capture group. The simplest way to exclude anything following the 32-character ID is to simply match anything following it that is not the closing quote, so, [^"]* .

The regex should be this:

@"\/~\/link\.aspx\?_id=([A-Z0-9]{32})[^""]*"

And with the removal of these extra useless brackets around your match, the replacement will simply use the first group:

"/$1/mylink.aspx"

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM