javascript使用正则表达式捕获来包装特定文本，但排除html标签属性

Question

I've got Regex targeting alpha-numeric strings that are product numbers (all will be CAP/number combinations of various lengths) wrapping these product numbers in bold tags for hundreds of generated HTML emails. 我已经用Regex定位了字母数字字符串，这些字母数字字符串是产品编号（所有都是长度的CAP /数字组合），这些产品编号以粗体标记包装成数百个生成的HTML电子邮件。

This worked great to bold product numbers, but also captures random parts of URLs and hex colors in my HTML email's tags attributes. 这对于大胆的产品编号非常有用，但也可以在HTML电子邮件的标签属性中捕获URL的随机部分和十六进制颜色。

I've tried to exclude hex colors, and only include text after ">" and before "<". 我试图排除十六进制颜色，只在“>”之后和“ <”之前包括文本。 These don't seem to omit certain URLs and hex colors. 这些似乎并未忽略某些URL和十六进制颜色。 Example...from this regex and replace syntax: 示例...来自此正则表达式并替换语法：

var newHtml = html.replace(new RegExp(/([0-9][^ ]*[A-Z][^ ]*)|([A-Z]
[^ ]*[0-9][^ ]*)(?=[^<|&lt;|http|#]*(>|&gt;|$))/g),"
<strong>$1</strong>");

and this text, from which I only want to wrap 09D623 that appears outside of tags:

Lorem ipsum <a href="http://www.example.com/09D623" target="blank"  
style="color: #66BB12;">dolor sit</a> amet, 09D623 non pulvinar nunc
egestas. Nunc sit amet imperdiet 09D623 magnat.

I still capture 66BB12, a hex color inside a tag along with extra characters following the color, and random URLs if they contain caps/numbers such as this example. 我仍然捕获66BB12，标记内的十六进制颜色以及该颜色后面的多余字符，以及随机URL（如果它们包含大写字母/数字），例如本示例。 I've tried to exclude hex color using this: ^(#[0-9a-f]{3}|[0-9a-f]{6})$ 我尝试使用以下方法排除十六进制颜色：^（＃[0-9a-f] {3} | [0-9a-f] {6}）$

and separately, tag contents using this expression: (?!([^<]+)?>) 并分别使用以下表达式标记内容：（？！（[^ <] +）？>）

but none of these seem to work as expected. 但这些似乎都无法按预期工作。 I'm not even sure I have the exclude expression correct — when it follows the expression I started with following new RegExp...above. 我什至不确定我的exclude表达式是否正确—当它遵循该表达式时，我首先遵循了新的RegExp...。

Thanks for any insights you can share... 多谢您分享的见解...

test is at https://regex101.com/r/rW6iL6/13 or, 测试位于https://regex101.com/r/rW6iL6/13或

Answer 1

I don't know enough about the strings to generalize this better, but it matches what you're looking for in the example: 我对字符串的了解还不足以更好地对此进行概括，但它与示例中要查找的内容匹配：

var email = 'Lorem ipsum <a href="http://www.example.com/09D623" target="blank" style="color: #66BB12;">dolor sit</a> amet, 09D623 non pulvinar nunc egestas. Nunc sit amet imperdiet 09D623 magnat.';
var modded = email.replace(/(\s\d+[A-Z]+\d+\s)/g, "<strong>$1</strong>");
document.write(modded);

Answer 2

So your regex seems a lot more complicated than it needs to be: 因此，您的正则表达式似乎要复杂得多：

\\s([0-9A-Z]{2,})\\s does a perfect job of matching what you want in the example: \\s([0-9A-Z]{2,})\\s可以完美匹配示例中的所需内容：

Finds any match 2 or more characters long surrounded by whitespace and captures only the numbers. 查找任何由空格包围的2个或更多匹配字符，并且仅捕获数字。

You could also add in allowed punctuation to the edges, but as long as you leave off # or ; 您还可以在边缘添加允许的标点符号，但是只要不使用#或; , it won't match the hex: ，它与十六进制不匹配：

[.,-"' ]([0-9A-Z]{2,})[.,-"' ] will match most other options that could be near the product number [.,-"' ]([0-9A-Z]{2,})[.,-"' ]将匹配大多数其他可能接近产品编号的选项

If you want to do it based on location according to > and < : 如果要根据>和< ：

>[^<]*?([0-9A-Z]{2,})(?:[^<]*?([0-9A-Z]{2,}))*

This allows it to look through any non-tag strings for any number of product numbers and return up to 2 results per >< . 这样一来，它就可以通过任何非标记字符串查找任意数量的产品编号，并且每个><返回最多2个结果。 You can chain more if you need more, but that's how the regex capture group do. 如果需要更多，则可以链接更多，但是正则表达式捕获组就是这样做的。

javascript使用正则表达式捕获来包装特定文本，但排除html标签属性

问题描述

2 个解决方案

解决方案1
0 2016-06-28 04:36:07

解决方案2
0 已采纳 2016-06-28 04:45:34

javascript使用正则表达式捕获来包装特定文本，但排除html标签属性

问题描述

2 个解决方案

解决方案1 0 2016-06-28 04:36:07

解决方案2 0 已采纳 2016-06-28 04:45:34

解决方案1
0 2016-06-28 04:36:07

解决方案2
0 已采纳 2016-06-28 04:45:34