简体   繁体   English

使用正则表达式使用字符串替换在JavaScript中引用嵌套组

[英]Referencing nested groups in JavaScript using string replace using regex

Because of the way that jQuery deals with script tags, I've found it necessary to do some HTML manipulation using regular expressions (yes, I know... not the ideal tool for the job). 由于jQuery处理脚本标记的方式,我发现有必要使用正则表达式进行一些HTML操作(是的,我知道......不是理想的工作工具)。 Unfortunately, it seems like my understanding of how captured groups work in JavaScript is flawed, because when I try this: 不幸的是,似乎我对JavaScript中捕获的组如何工作的理解存在缺陷,因为当我尝试这样做时:

var scriptTagFormat = /<script .*?(src="(.*?)")?.*?>(.*?)<\/script>/ig;

html = html.replace(
    scriptTagFormat, 
    '<span class="script-placeholder" style="display:none;" title="$2">$3</span>');

The script tags get replaced with the spans, but the resulting title attribute is blank. 脚本标记将替换为跨度,但结果title属性为空。 Shouldn't $2 match the content of the src attribute of a script tag? $2不应该匹配脚本标记的src属性的内容吗?

Nesting of groups is irrelevant; 群体的嵌套是无关紧要的; their numbering is determined strictly by the positions of their opening parentheses within the regex. 它们的编号严格取决于正则表达式中它们的开括号的位置。 In your case, that means it's group #1 that captures the whole src="value" sequence, and group #2 that captures just the value part. 在您的情况下,这意味着它的组#1捕获整个src="value"序列,组#2捕获仅value

The .*? .*? matches too much because the following group is optional, ==> your src is matched from one of the .*? 匹配太多,因为以下组是可选的,==>你的src与其中一个匹配.*? around. 周围。 if you remove the ? 如果你删除? after your first group it works. 在你的第一组之后它起作用了。

Update: As @morja pointed out your solution is to move the first .*? 更新:正如@morja指出你的解决方案是移动第一个.*? into the optional src part. 进入可选的src部分。

Just for completeness: /<script (?:.*?(src="(.*?)"))?.*?>(.*?)<\\/script>/ig 只是为了完整性: /<script (?:.*?(src="(.*?)"))?.*?>(.*?)<\\/script>/ig

You can see it here on rubular (corrected my link also) 你可以在rubular上看到它(也纠正了我的链接)

If you don't want to use the content of the first capturing group, then make it a non capturing group using (?:) 如果您不想使用第一个捕获组的内容,请使用(?:)将其设为非捕获组

/<script (?:.*?(?:src="(.*?)"))?.*?>(.*?)<\/script>/ig

Then your wanted result is in $1 and $2. 那么你想要的结果是1美元和2美元。

Try this: 试试这个:

/<script (?:(?!src).)*(?:src="(.*?)")?.*?>(.*?)<\/script>/ig

See here: rubular 见这里: rubular

As stema wrote, the .*? 正如斯特马所写, .*? matches too much. 比赛太多了。 With the negative lookahead (?:(?!src).)* you will match only until a src attribute. 使用否定前瞻(?:(?!src).)*您将仅匹配直到src属性。

But actually in this case you could also just move the .*? 但实际上在这种情况下你也可以移动.*? into the optional part: 进入可选部分:

/<script (?:.*?src="(.*?)")?.*?>(.*?)<\/script>/ig

See here: rubular 见这里: rubular

Could you post the html you are retrieving? 你可以发布你正在检索的HTML吗? Your code works fine in a simple example: jsfiddle (warning: alert box) 您的代码在一个简单示例中正常工作: jsfiddle(警告:警告框)

My first guess is that one of your script tags does not have a src meaning you are left with a single capture group (the script contents). 我的第一个猜测是你的一个脚本标签没有src意味着你留下了一个捕获组(脚本内容)。

I'm thinking that regular expressions by themselves can't do exactly what I'm looking for, so here's my modification to work around the problem: 我认为正则表达式本身并不能完全符合我的要求,所以这是我修改以解决问题的方法:

var scriptTagFormat = /<script\s+((.*?)="(.*?)")*\s*>(.*?)<\/script>/ig;

html = html.replace(
    scriptTagFormat, 
    '<span class="script-placeholder" style="display:none;" $1>$4</span>');

Before, I wanted to avoid setting non-standard attributes on the replacement span . 之前,我想避免在替换span上设置非标准属性。 This code blindly copies all attributes instead. 此代码盲目地复制所有属性。 Luckily, the non-standard attributes aren't stripped out of the DOM when I insert the HTML, so it will work for my purposes. 幸运的是,当我插入HTML时,非标准属性不会从DOM中删除,因此它可以用于我的目的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM