如何提高JavaScript文本格式化程序的性能？

Question

I am allowing my users to wrap words with "*", "/", "_", and "-" as a shorthand way to indicate they'd like to bold, italicize, underline, or strikethrough their text. 我允许我的用户用“ *”，“ /”，“ _”和“-”换行，以表示他们想要加粗，斜体，下划线或删除线。 Unfortunately, when the page is filled with text using this markup, I'm seeing a noticeable (borderline acceptable) slow down. 不幸的是，当页面使用此标记填充文本时，我看到了明显的（可接受的边界线）减速。

Here's the JavaScript I wrote to handle this task. 这是我编写的用于处理此任务的JavaScript。 Can you please provide feedback on how I could speed things up? 您能否提供有关如何加快速度的反馈？

function handleContentFormatting(content) {
    content = handleLineBreaks(content);

    var bold_object = {'regex': /\*(.|\n)+?\*/i, 'open': '<b>', 'close': '</b>'};
    var italic_object = {'regex': /\/(?!\D>|>)(.|\n)+?\//i, 'open': '<i>', 'close': '</i>'};
    var underline_object = {'regex': /\_(.|\n)+?\_/i, 'open': '<u>', 'close': '</u>'};
    var strikethrough_object = {'regex': /\-(.|\n)+?\-/i, 'open': '<del>', 'close': '</del>'};

    var format_objects = [bold_object, italic_object, underline_object, strikethrough_object];

    for( obj in format_objects ) {
        content = handleTextFormatIndicators(content, format_objects[obj]);
    }

    return content;
}

//@param obj --- an object with 3 properties:
//      1.) the regex to search with
//      2.) the opening HTML tag that will replace the opening format indicator
//      3.) the closing HTML tag that will replace the closing format indicator
function handleTextFormatIndicators(content, obj) {
    while(content.search(obj.regex) > -1) {
        var matches = content.match(obj.regex);
        if( matches && matches.length > 0) {
            var new_segment = obj.open + matches[0].slice(1,matches[0].length-1) + obj.close;
            content = content.replace(matches[0],new_segment);
        }
    }
    return content;
}

Answer 1

Your code is forcing the browser to do a whole lot of repeated, wasted work. 您的代码迫使浏览器执行大量重复的，浪费的工作。 The approach you should be taking is this: 您应该采用的方法是：

Concoct a regex that combines all of your "target" regexes with another that matches a leading string of characters that are not your special meta-characters. 构造一个将所有“目标”正则表达式与另一个与不是特殊元字符的前导字符串匹配的正则表达式相结合的正则表达式。
Change the loop so that it does the following: 更改循环，使其执行以下操作：
1. Grab the next match from the source string. 从源字符串中获取下一个匹配项。 That match, due to the way you changed your regex, will be a string of non-meta characters followed by your matched portion. 由于您更改了正则表达式的方式，该匹配项将是一串非元字符，后跟匹配的部分。
2. Append the non-meta characters and the replacement for the target portion onto a separate array of strings. 将非元字符和目标部分的替换附加到单独的字符串数组中。
At the end of that process, the separate accumulator array can be joined and used to replace the content. 在该过程结束时，可以将单独的累加器数组连接起来并用来替换内容。

As to how to combine the regular expressions, well, it's not very pretty in JavaScript but it looks like this. 关于如何组合正则表达式，在JavaScript中它不是很漂亮，但是看起来像这样。 First, you need a regex for a string of zero or more "uninteresting" characters. 首先，您需要一个用于零个或多个“无趣”字符的字符串的正则表达式。 That should be the first capturing group in the regex. 那应该是正则表达式中的第一个捕获组。 Next should be the alternates for the target strings you're looking for. 接下来应该是您要查找的目标字符串的替代项。 Thus the general form is: 因此，一般形式为：

var tokenizer = /(uninteresting pattern)?(?:(target 1)|(target 2)|(target 3)| ... )?/;

When you match that against the source string, you'll get back a result array that will contain the following: 将其与源字符串匹配时，您将返回一个包含以下内容的结果数组：

result[0] - entire chunk of string (not used)
result[1] - run of uninteresting characters
result[2] - either an instance of target type 1, or null
result[3] - either an instance of target type 2, or null
...

Thus you'll know which kind of replacement target you saw by checking which of the target regexes are non empty. 因此，通过检查哪些目标正则表达式为非空，您将知道所看到的替换目标类型。 (Note that in your case the targets can conceivably overlap; if you intend for that to work, then you'll have to approach this as a full-blown parsing problem I suspect.) （请注意，在您的情况下，目标可能会重叠；如果您打算使目标起作用，那么您将不得不将其视为一个成熟的解析问题，我怀疑。）

Answer 2

Change your regex with the flags /ig and remove the while loop. 使用/ig标志更改您的正则表达式，并删除while循环。
Change your for(obj in format_objects) loop with a normal for loop, because format_objects is an array. 用常规的for循环更改for(obj in format_objects)循环，因为format_objects是一个数组。

Update 更新

Okay, I took the time to write an even faster and simplified solution, based on your code: 好的，我花了一些时间根据您的代码编写一个更快，更简化的解决方案：

function handleContentFormatting(content) {
    content = handleLineBreaks(content);

    var bold_object = {'regex': /\*([^*]+)\*/ig, 'replace': '<b>$1</b>'},
        italic_object = {'regex': /\/(?!\D>|>)([^\/]+)\//ig, 'replace': '<i>$1</i>'},
        underline_object = {'regex': /\_([^_]+)\_/ig, 'replace': '<u>$1</u>'},
        strikethrough_object = {'regex': /\-([^-]+)\-/ig, 'replace': '<del>$1</del>'};

    var format_objects = [bold_object, italic_object, underline_object, strikethrough_object],
        i = 0, foObjSize = format_objects.length;

    for( i; i < foObjSize; i++ ) {
        content = handleTextFormatIndicators(content, format_objects[i]);
    }

    return content;
}

//@param obj --- an object with 2 properties:
//      1.) the regex to search with
//      2.) the replace string
function handleTextFormatIndicators(content, obj) {
    return content.replace(obj.regex, obj.replace);
}

Here is a demo . 这是一个演示 。

This will work with nested and/or not nested formatting boundaries. 这将适用于嵌套和/或非嵌套格式边界。 You can omit the function handleTextFormatIndicators altogether if you want to, and do the replacements inline inside handleContentFormatting . 如果需要，可以完全省略功能handleTextFormatIndicators ，并在handleContentFormatting内联替换。

Answer 3

You can do things like: 你可以这样做：

function formatText(text){
    return text.replace(
        /\*([^*]*)\*|\/([^\/]*)\/|_([^_]*)_|-([^-]*)-/gi,
        function(m, tb, ti, tu, ts){
            if(typeof(tb) != 'undefined')
                return '<b>' + formatText(tb) + '</b>';
            if(typeof(ti) != 'undefined')
                return '<i>' + formatText(ti) + '</i>';
            if(typeof(tu) != 'undefined')
                return '<u>' + formatText(tu) + '</u>';
            if(typeof(ts) != 'undefined')
                return '<del>' + formatText(ts) + '</del>';
            return 'ERR('+m+')';
        }
    );
}

This will work fine on nested tags, but will not with overlapping tags, which are invalid anyway. 这在嵌套标签上可以正常使用，但对于重叠标签则无效，但重叠标签仍然无效。

Example at http://jsfiddle.net/m5Rju/ http://jsfiddle.net/m5Rju/上的示例

如何提高JavaScript文本格式化程序的性能？

问题描述

3 个解决方案

解决方案1
1 2011-08-01 18:28:16

解决方案2
1 已采纳 2011-08-01 19:25:54

Update 更新

解决方案3
1 2011-08-01 19:33:27

如何提高JavaScript文本格式化程序的性能？

问题描述

3 个解决方案

解决方案1 1 2011-08-01 18:28:16

解决方案2 1 已采纳 2011-08-01 19:25:54

Update 更新

解决方案3 1 2011-08-01 19:33:27

解决方案1
1 2011-08-01 18:28:16

解决方案2
1 已采纳 2011-08-01 19:25:54

解决方案3
1 2011-08-01 19:33:27