简体   繁体   English

javascript使用正则表达式捕获来包装特定文本,但排除html标签属性

[英]javascript to wrap specific text using regex capture but exclude html tag attributes

I've got Regex targeting alpha-numeric strings that are product numbers (all will be CAP/number combinations of various lengths) wrapping these product numbers in bold tags for hundreds of generated HTML emails. 我已经用Regex定位了字母数字字符串,这些字母数字字符串是产品编号(所有都是长度的CAP /数字组合),这些产品编号以粗体标记包装成数百个生成的HTML电子邮件。

This worked great to bold product numbers, but also captures random parts of URLs and hex colors in my HTML email's tags attributes. 这对于大胆的产品编号非常有用,但也可以在HTML电子邮件的标签属性中捕获URL的随机部分和十六进制颜色。

I've tried to exclude hex colors, and only include text after ">" and before "<". 我试图排除十六进制颜色,只在“>”之后和“ <”之前包括文本。 These don't seem to omit certain URLs and hex colors. 这些似乎并未忽略某些URL和十六进制颜色。 Example...from this regex and replace syntax: 示例...来自此正则表达式并替换语法:

var newHtml = html.replace(new RegExp(/([0-9][^ ]*[A-Z][^ ]*)|([A-Z]
[^ ]*[0-9][^ ]*)(?=[^<|&lt;|http|#]*(>|&gt;|$))/g),"
<strong>$1</strong>");

and this text, from which I only want to wrap 09D623 that appears outside of tags:

Lorem ipsum <a href="http://www.example.com/09D623" target="blank"  
style="color: #66BB12;">dolor sit</a> amet, 09D623 non pulvinar nunc
egestas. Nunc sit amet imperdiet 09D623 magnat.

I still capture 66BB12, a hex color inside a tag along with extra characters following the color, and random URLs if they contain caps/numbers such as this example. 我仍然捕获66BB12,标记内的十六进制颜色以及该颜色后面的多余字符,以及随机URL(如果它们包含大写字母/数字),例如本示例。 I've tried to exclude hex color using this: ^(#[0-9a-f]{3}|[0-9a-f]{6})$ 我尝试使用以下方法排除十六进制颜色:^(#[0-9a-f] {3} | [0-9a-f] {6})$

and separately, tag contents using this expression: (?!([^<]+)?>) 并分别使用以下表达式标记内容:(?!([^ <] +)?>)

but none of these seem to work as expected. 但这些似乎都无法按预期工作。 I'm not even sure I have the exclude expression correct — when it follows the expression I started with following new RegExp...above. 我什至不确定我的exclude表达式是否正确—当它遵循该表达式时,我首先遵循了新的RegExp...。

Thanks for any insights you can share... 多谢您分享的见解...

test is at https://regex101.com/r/rW6iL6/13 or, 测试位于https://regex101.com/r/rW6iL6/13 regex101的测试结果,显示蓝色突出显示的匹配项

I don't know enough about the strings to generalize this better, but it matches what you're looking for in the example: 我对字符串的了解还不足以更好地对此进行概括,但它与示例中要查找的内容匹配:

var email = 'Lorem ipsum <a href="http://www.example.com/09D623" target="blank" style="color: #66BB12;">dolor sit</a> amet, 09D623 non pulvinar nunc egestas. Nunc sit amet imperdiet 09D623 magnat.';
var modded = email.replace(/(\s\d+[A-Z]+\d+\s)/g, "<strong>$1</strong>");
document.write(modded);

So your regex seems a lot more complicated than it needs to be: 因此,您的正则表达式似乎要复杂得多:

\\s([0-9A-Z]{2,})\\s does a perfect job of matching what you want in the example: \\s([0-9A-Z]{2,})\\s可以完美匹配示例中的所需内容:

Finds any match 2 or more characters long surrounded by whitespace and captures only the numbers. 查找任何由空格包围的2个或更多匹配字符,并且仅捕获数字。

You could also add in allowed punctuation to the edges, but as long as you leave off # or ; 您还可以在边缘添加允许的标点符号,但是只要不使用#; , it won't match the hex: ,它与十六进制不匹配:

[.,-"' ]([0-9A-Z]{2,})[.,-"' ] will match most other options that could be near the product number [.,-"' ]([0-9A-Z]{2,})[.,-"' ]将匹配大多数其他可能接近产品编号的选项

If you want to do it based on location according to > and < : 如果要根据><

>[^<]*?([0-9A-Z]{2,})(?:[^<]*?([0-9A-Z]{2,}))*

This allows it to look through any non-tag strings for any number of product numbers and return up to 2 results per >< . 这样一来,它就可以通过任何非标记字符串查找任意数量的产品编号,并且每个><返回最多2个结果。 You can chain more if you need more, but that's how the regex capture group do. 如果需要更多,则可以链接更多,但是正则表达式捕获组就是这样做的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM