正则表达式排除，如果在字符串前面？

Question

I haven't used regex much before but found something useful on the net that I'm using: 我以前没有使用过正则表达式，但是在网上发现了一些有用的东西：

private string ConvertUrlsToLinks(string msg)
{
    string regex = @"((www\.|(http|https|ftp|news|file)+\:\/\/)[&#95;.a-z0-9-]+\.[a-z0-9\/&#95;:@=.+?,##%&~-]*[^.|\'|\\||\# |!|\(|?|\[|,| |>|<|;|\)])";
    Regex r = new Regex(regex, RegexOptions.IgnoreCase);
    return r.Replace(msg, "<a href=\"$1\" title=\"Click to open in a new window or tab\" target=\"&#95;blank\">$1</a>").Replace("href=\"www", "href=\"http://www").Replace(@"\r\n", "<br />").Replace(@"\n", "<br />").Replace(@"\r", "<br />");
}

It does a good job but now I want it to exclude urls that already have a "a href=" in front. 它做得很好，但现在我希望它排除前面已经有“ a href =“的网址。 There's the ending "/a" to consider too. 也要考虑结尾“ / a”。

Can that be done with regex or have to use totally different approach, like coding? 可以使用正则表达式来完成，还是必须使用完全不同的方法（例如编码）？

Answer 1

Try this: 尝试这个：

((?<!href=')(?<!href=")(www\.|(http|https|ftp|news|file)+\:\/\/)[&#95;.a-z0-9-]+\.[a-z0-9\/&#95;:@=.+?,##%&~-]*[^.|\'|\\||\# |!|\(|?|\[|,| |>|<|;|\)])

I tested on regex101.com 我在regex101.com上测试过

With the following sample set: 使用以下示例集：

www.google.com
http://hi.com
http://www.fishy.com
href='www.ignore.com'
www.ouch.com

Answer 2

Using your existing regex pattern you could make a few simple changes to handle additional text being prepended or appended to your string: 使用现有的regex模式，您可以进行一些简单的更改，以处理在字符串之前或之后附加的文本：

`.+` <- pattern -> `(.+)?`

Which would give you: 这会给你：

.+((www\.|(http|https|ftp|news|file)+\:\/\/)[&#95;.a-z0-9-]+\.[a-z0-9\/&#95;:@=.+?,##%&~-]*[^.|\'|\\||\# |!|\(|?|\[|,| |>|<|;|\)])(.+)?

So passing the string of either: 因此，传递以下任一字符串：

<a href='http://www.test.com'>http://www.test.com</a>

...or... ...要么...

http://www.test.com

Would result in: 将导致：

<a href="http://www.test.com" title="Click to open in a new window or tab" target="&#95;blank">www.test.com</a>

Examples: 例子：

https://regex101.com/r/bO0cW6/1 https://regex101.com/r/bO0cW6/1

http://ideone.com/suVw3I http://ideone.com/suVw3I

Answer 3

I think it would be a little ToNy tHe pOny to do that in regex after all, so wrote the code, in case anyone is interested here it is: 我认为毕竟在正则表达式中这样做会有点麻烦，所以写了代码，以防万一有人对此感兴趣：

private string handleatag(string msg, string tagbegin, string tagend)
{
    ArrayList tags = new ArrayList();
    int tagbeginpos = msg.IndexOf(tagbegin);
    int tagendpos;

    string hash = tagbegin.GetHashCode().ToString();

    while (tagbeginpos != -1)
    {
        tagendpos = msg.IndexOf(tagend, tagbeginpos);

        if (tagendpos != -1)
        {
            string atag = msg.Substring(tagbeginpos, tagendpos - tagbeginpos + tagend.Length);
            msg = msg.Replace(atag, hash + tags.Count.ToString());
            tags.Add(atag);
        }
        else
            msg = msg.Remove(tagbeginpos, tagbegin.Length);

        tagbeginpos = msg.IndexOf(tagbegin, tagbeginpos);
    }

    msg = ConvertUrlsToLinks(msg);

    for (int i = 0; i < tags.Count; i++)
        msg = msg.Replace(hash + i.ToString(), tags[i].ToString());

    return msg;
}

private string ConvertUrlsToLinks(string msg)
{
    if (msg.IndexOf("<a href=") != -1)
        return handleatag(msg, "<a href=", "</a>");

    string regex = @"((www\.|(http|https|ftp|news|file)+\:\/\/)[&#95;.a-z0-9-]+\.[a-z0-9\/&#95;:@=.+?,##%&~-]*[^.|\'|\\||\# |!|\(|?|\[|,| |>|<|;|\)])";
    Regex r = new Regex(regex, RegexOptions.IgnoreCase);
    return r.Replace(msg, "<a href=\"$1\" title=\"Click to open in a new window or tab\" target=\"&#95;blank\">$1</a>").Replace("href=\"www", "href=\"http://www").Replace(@"\r\n", "<br />").Replace(@"\n", "<br />").Replace(@"\r", "<br />");
}

正则表达式排除，如果在字符串前面？

问题描述

3 个解决方案

解决方案1
0 2015-01-18 03:44:45

解决方案2
0 2015-01-18 04:14:03

解决方案3
0 2015-01-19 00:54:26

正则表达式排除，如果在字符串前面？

问题描述

3 个解决方案

解决方案1 0 2015-01-18 03:44:45

解决方案2 0 2015-01-18 04:14:03

解决方案3 0 2015-01-19 00:54:26

解决方案1
0 2015-01-18 03:44:45

解决方案2
0 2015-01-18 04:14:03

解决方案3
0 2015-01-19 00:54:26