简体   繁体   中英

C# Regex, match but not include the first character before matched string

How can I make this C# Regex to not include the first character before the URL in the matching results:

((?!\").)https?:\/\/twitter\.com\/(?:#!\/)?(\w+)\/status(?:es)?\/(\d+)

This will match:

Xhttps://twitter.com/oppomobileindia/status/798397636780953600

Notice the first X letter.

I want it to match the URLs that start without double quotes. Also not include the first character before the https for those URLs that do not start with double quotes.

An actual example that I use in my code:

 var str = "<div id=\"content\">
             <p>https://twitter.com/oppomobileindia/status/798397636780953600</p>
             <p>\"https://twitter.com/oppomobileindia/status/11111111111111111111</p></div>";

 var pattern = @"(?<!""')https?://twitter\.com/(?:#!/)?(\w+)/status(?:es)?/(\d+)";//

var rgx = new Regex(pattern);

var results = rgx.Replace(str, "XXX");

In the above example, only the first URL should be replaces, because the second one has double quotation before the URL. It also should be replaced at the exact match, without the first letter before the matches string.

Use a (?<!") negative lookbehind:

var re = @"(?<!"")https?://twitter\.com/(?:#!/)?(\w+)/status(?:es)?/(\d+)";

The (?<!") means that there cannot be a " immediately before the current location.

In C#, you do not need to escape / inside the pattern since regex delimiters are not used when defining the regex.

Note on the C# syntax: if you want to define a " inside a verbatim string literal, double it. In a regular string literal, escape the " and \\ :

var re = "(?<!\")https?://twitter\\.com/(?:#!/)?(\\w+)/status(?:es)?/(\\d+)";

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM