简体   繁体   English

正则表达式-匹配(如果不在X,Y和Z中)

[英]Regular expression - Match, if not in X and Y and Z

I want to match mail addresses in a string. 我要匹配字符串中的邮件地址。 That's no problem. 那没问题。 But for any reason, i fail on excluding special html tags and attributes. 但是由于任何原因,我都无法排除特殊的html标签和属性。

My mail regex: 我的邮件正则表达式:

[!#\$%&'\*\+\-\/0-9=\?a-z\^_`\{\}\|~]*(?:\\[\x00-\x7F][!#\$%&'\*\+\-\/0-9=\?a-z\^_`\{\}\|~]*)*(?:\.[!#\$%&'\*\+\-\/0-9=\?a-z\^_`\{\}\|~]*(?:\\[\x00-\x7F][!#\$%&'\*\+\-\/0-9=\?a-z\^_`\{\}\|~]*)*)*@[a-z0-9](?:[a-z0-9-]*[a-z0-9])?(?:\.[a-z0-9](?:[a-z0-9-]*[a-z0-9])?)*\.[a-z]{2,}

Now, i dont want to match, if the mail address is within an input field: 现在,我不想匹配,如果邮件地址在输入字段中:

<input type="xxx" value"foo@bar.tld">

I also dont want to match, if it's in the title tag 我也不想匹配,如果它在标题标签中

<title>foo@bar.tld

nor if it's contained in <style and <script 也不包含在<style<script

I tried this look ahead thing, but i produce illegal regular expressions or it just doesnt work. 我尝试了这种前瞻性的方法,但是我生成了非法的正则表达式,或者它不起作用。

One regular expression is not going to be able to exclude and include simultaneously in the way you want. 一个正则表达式将无法以您想要的方式同时排除和包含。

If your target document is well-formed XML then you could use one or more regular expressions to find and replace tags with the empty string, then use your working regex to find mail addresses in whatever text is left. 如果目标文档是格式正确的XML,则可以使用一个或多个正则表达式来查找标记并将其替换为空字符串,然后使用工作的正则表达式查找剩下的文本形式的邮件地址。

However, I have to agree with Bohemian that an XML parser is the best way to go, if your target is an XML file. 但是,我必须同意Bohemian,如果目标是XML文件,那么XML解析器是最好的选择。 XML is complex and flexible, and there's always the risk that you'll hit a file with features you forgot about when designing your replace-with-empty-string regex, such as CDATA and comment blocks. XML是复杂且灵活的,在设计空字符串替换正则表达式(例如CDATA和注释块)时,总会有被遗忘的功能打入文件的风险。 Best to stick with a parser which is designed and tested for running through XML and extracting the document part by part. 最好坚持使用经过分析和设计的解析器,以通过XML运行并逐步提取文档。

If your target document is unruly HTML which an XML parser can't read, then you may have to try the replace-then-search method. 如果您的目标文档是XML解析器无法读取的杂乱的HTML,则您可能必须尝试replace-then-search方法。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM