简体   繁体   English

用于检测文本中的电子邮件的正则表达式

[英]Regex for detecting emails in text

I have a Regex in C# to detect emails in text and then I put an anchor tag with mailto parameter in it to make it clickable. 我在C#中使用Regex来检测文本中的电子邮件,然后在其中放入一个带有mailto参数的锚标签,以使其可以点击。 But if the email is already in an anchor tag, the regex detects the email in the anchor tag and then then next code puts another anchor tag over it. 但是,如果电子邮件已经在锚标记中,则正则表达式会检测锚标记中的电子邮件,然后下一个代码会在其上添加另一个锚标记。 Is there any way in Regex to avoid the emails which are already in the anchor tag? 在Regex中有什么方法可以避免已经存在于锚标记中的电子邮件吗?

The regex code in C# is: C#中的正则表达式代码是:

string sRegex = @"([\w-]+(\.[\w-]+)*@([a-z0-9-]+(\.[a-z0-9-]+)*?\.[a-z]{2,6}|(\d{1,3}\.){3}\d{1,3})(:\d{4})?)";

Regex Regx = new Regex(sRegex, RegexOptions.IgnoreCase);

and the sample text is: 示例文本是:

string sContent = "ttt <a href='mailto:someone@example.com'>someemail@mail.com</a> abc email@email.com";

and the desired output is: 并且所需的输出是:

"ttt <a href='mailto:someone@example.com'>someemail@mail.com</a> abc <a href='mailto:email@email.com'>email@email.com</a>";

So, the whole point here is that Regex should only detect those valid emails which are not inside an anchor tag or already clickable and neither should be the anchor tag's href value inside the anchor tag. 因此,这里的重点是,Regex应该只检测那些不在锚标签内或已经可点击的有效电子邮件,并且它们都不应该是锚标记内锚标记的href值。

The above given Regex is detecting every possible email in the text which is not desired. 上面给出的正则表达式检测文本中不需要的每个可能的电子邮件。

Could you use a negative look behind to test for mailto: 您可以使用负面外观来测试mailto:

(?<!mailto\\:)([\\w-]+(.[\\w-]+)@([a-z0-9-]+(.[a-z0-9-]+)?.[az]{2,6}|(\\d{1,3}.){3}\\d{1,3})(:\\d{4})?)

Should match anything that is not preceded by mailto: 应匹配mailto:之前没有的任何内容mailto:

I think what is happening is the . 我认为正在发生的事情是. in ([\\w\\-]+(.[\\w-])+) is matching too much. in ([\\w\\-]+(.[\\w-])+)匹配太多。 Did you mean to use . 你的意思是使用. rather than \\. 而不是\\. ?

By escaping the . 通过逃避. the following code produces 以下代码生成

someemail@mail.com
email@email.com


public void Test()
{

    Regex pattern = new Regex(@"\b(?<!mailto:)([\w\-]+(\.[\w\-])*@([a-z0-9-]+(.[a-z0-9-]+)?.[a-z]{2,6}|(\d{1,3}.){3}\d{1,3})(:\d{4})?)");
    MatchCollection matchCollection = pattern.Matches("ttt <a href='mailto:someone@example.com'>someemail@mail.com</a> abc email@email.com");
    foreach (Match match in matchCollection)
    {
        Debug.WriteLine(match);
    }
}

A real world implementation of what it seems like you're trying to do might look more like this 现实世界中您正在尝试做的事情看起来可能更像这样

Regex pattern = new Regex(@"(?<!mailto\:)\b[\w\-]+@[a-z0-9-]+(\.[a-z0-9\-])*\.[a-z]{2,8}\b(?!\<\/a)");
MatchCollection matchCollection = pattern.Matches("ttt <a href='mailto:so1meone@example.com'>someemail@mail.com</a> abc email@email.com");
foreach (Match match in matchCollection)
{
    Debug.WriteLine(match);
}

Sorry, you are correct, I hadn't considered that the negative assertion wouldn't be greedy enough. 对不起,你是对的,我没有认为否定的断言不会贪得无厌。

\\b(?!mailto\\:)([\\w-]+(.[\\w-]+)@([a-z0-9-]+(.[a-z0-9-]+)?.[az]{2,6}|(\\d{1,3}.){3}\\d{1,3})(:\\d{4})?)

should work 应该管用

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM