简体   繁体   English

使用正则表达式替换/清除子字符串的性能问题

[英]Performance issue using regex to replace/clear substring

I have a string containing things like this: 我有一个包含以下内容的字符串:

<a@{style}@{class}@{data} id="@{attr:id}">@{child:content} @{child:whatever}</a>

Everything to do here is just clear @{xxx} , except sub-strings starting with @{child: . 除了以@{child: .开头的子字符串之外,这里所有要做的事情都是清晰的@{xxx} @{child: .

I used str.match() to get all sub-strings "@{*}" in an array to search and keep all @{child: substrings: 我使用str.match()获取数组中的所有子字符串"@{*}"来搜索并保留所有@{child: substrings:

var matches = str.match(new RegExp("@\{(.*?)\}",'g'));
if (matches && matches.length){
    for(var i=0; i<matches.length; i++){
        if (matches[i].search("@{child:") == -1) str = str.replace(matches[i],'');  
    }
}

I got it running ok, but it's too slow when string becomes bigger (~2 seconds / +1000 nodes like this one on top) 我让它运行正常,但是当字符串变大时它太慢了(〜2秒/ +1000个这样的节点在顶部)

Is there some alternative to do it, maybe using a rule (if exists) to escape @{child: direct in regex and improve performance? 是否有其他替代方法,也许使用规则(如果存在)来转义@{child:直接在正则表达式中并提高性能?

If I understand your question correctly you don't want to remove the @{child:...} sub-strings but everything else of the format @{...} should go. 如果我正确地理解了您的问题,则您不想删除@{child:...}子字符串,但其他所有格式为@{...}子字符串都应该删除。 In which case can you could change the regular expression to check that child: is not matched when you perform the replace: 在这种情况下,您可以更改正则表达式以在执行替换时检查child:是否不匹配:

var str = '<a@{style}@{class}@{data} id="@{attr:id}">@{child:content} @{child:whatever}</a>';
str = str.replace(/@\{((?!child:)[\s\S])+?\}/g, '');

This seems pretty fast. 这看起来非常快。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM