简体   繁体   English

在Powershell中使用Regex来抓取电子邮件

[英]Using Regex in Powershell to grab email

I have wrote a script to grab different fields in an HTML file and populate variables with the results. 我编写了一个脚本来抓取HTML文件中的不同字段,并使用结果填充变量。 I'm having issues with the regular expression for grabbing the email. 我正在使用正则表达式来抓取电子邮件。 Here is some sample code: 以下是一些示例代码:

$txt='<p class=FillText><a name="InternetMail_P3"></a>First.Last@company-name.com</p>'

$re='.*?'+'([\\w-+]+(?:\\.[\\w-+]+)*@(?:[\\w-]+\\.)+[a-zA-Z]{2,7})'

if ($txt -match $re)
{
    $email1=$matches[1]
    write-host "$email1"
}

I get the following error: 我收到以下错误:

Bad argument to operator '-match': parsing ".*?([\\w-+]+(?:\\.[\\w-+]+)*@(?:[\\w-]+\\
.)+[a-zA-Z]{2,7})([\\w-+]+(?:\\.[\\w-+]+)*@(?:[\\w-]+\\.)+[a-zA-Z]{2,7})" - [x-y] range in reverse order..
At line:7 char:16
+ if ($txt -match <<<<  $re)
    + CategoryInfo          : InvalidOperation: (:) [], RuntimeException
    + FullyQualifiedErrorId : BadOperatorArgument

What am I missing here? 我在这里错过了什么? Also, is there a better regex for email? 此外,是否有更好的电子邮件正则表达式?

Thanks in advance. 提前致谢。

Actually any regex that is suitable for .Net or C# will work for PowerShell . 实际上,任何适用于.Net或C#的正则表达式都适用于PowerShell And you could find tons and tons samples at stackoverflow and inet. 你可以在stackoverflow和inet找到吨和吨样本。 For example: How to Find or Validate an Email Address: The Official Standard: RFC 2822 例如: 如何查找或验证电子邮件地址:官方标准:RFC 2822

$txt='<p class=FillText><a name="InternetMail_P3"></a>First.Last@company-name.com</p>'
$re="[a-z0-9!#\$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#\$%&'*+/=?^_`{|}~-]+)*@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?"
[regex]::MAtch($txt, $re, "IgnoreCase ")

But there is also other part of this answer. 但是这个答案还有其他部分。 Regex by nature is not very suitable to parse XML/HTML . 正则表达式本质上不太适合解析XML / HTML You could find more details here: Using regular expressions to parse HTML: why not? 您可以在此处找到更多详细信息: 使用正则表达式解析HTML:为什么不呢?

To provide real solution, I'm recomment first 为了提供真正的解决方案,我先推荐

  1. convert HTML → XHTML 转换HTML→XHTML
  2. walk over XML tree 走过XML树
  3. work with individual nodes one by one, even using regex. 即使使用正则表达式,也可以逐个使用各个节点。

When it comes to email validation I usually choose the short version of RFC 2822 being: 在电子邮件验证方面,我通常会选择RFC 2822的简短版本:

[a-z0-9!#$%&'*+/=?^_ {|}~-]+(?:\\.[a-z0-9!#$%&'*+/=?^_ {|}~-]+)*@(?:a-z0-9?.)+a-z0-9? [a-z0-9!#$%&'* + / =?^ _ {|}~-]+(?:\\.[a-z0-9!#$%&'*+/=?^_ {|}〜 - ] +)* @(?:A-Z0-9)+ A-Z0-9?。?

You can find more info about email validation here 您可以在此处找到有关电子邮件验证的更多信

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM