简体   繁体   中英

Match a “,” after a specific pattern while excluding this pattern. (lookbehind in JavaScript)

I need to split a pattern like the following on some "," signs.

Input:

"stri,ng1 ext,string2 ext,string3, string4 ,string5"

Output:

["stri,ng1 ext", "string2 ext", "string3", "string4", "string5"]

The "," to match has the following rules:

  1. Either it has at least one or more preceding or following whitespaces (ie (\\s+,\\s*|\\s*,\\s+)
  2. It is the first "," after one or more white spaces followed by some other charakters (ie: \\s+(. ?),\\s )

The problem with the second pattern is, that this also includes the "ext" part. It would be nice to have an efficient pattern to match only the ",".

If it isn't possible at least a short algorythm. Can anybody help?

My current not so nice pattern looks like this:

\\s+(.*?)[^\\s+],+\\s*|\\s*,+\\s+|\\s+,+\\s*

While this matches the right and only the right commas, it also includes the "ext" part. how can I exclude them.

You can use this code in Javascript:

var str = 'stri,ng1 ext,string2 ext,string3, string4 ,string5';
var m = str.replace(/ +,|, +| +([^,]*),/g, function($0, $1) {
            var p=($1!=undefined)?" "+$1:""; return(p + "##"); }).split('##');
//=> ["stri,ng1 ext", "string2 ext", "string3", "string4", "string5"]

JSFiddle Demo

Just wanted to give a regex-less approach as well. It's quite impressive to see how much code is required for such a simple task.

tokenize("stri,ng1 ext,string2 ext,string3, string4 ,string5");
//["stri,ng1 ext", "string2 ext", "string3", "string4 ", "string5"]

function tokenize(str) {

    var tokens = [],
        i = 0,
        tokenStartIndex = 0,
        spaceSeenSinceLastToken = false,
        nonSpaceSeenSinceLastToken = false,
        spacesCountSinceLastNonSpace = 0,
        SPACE = ' ',
        len = str.length,
        nextIndex, char, prevCharIsSpace, nextCharIsSpace, lastToken;

    for (; i < len; i++) {
        if (SPACE == (char = str[i])) {
            spaceSeenSinceLastToken = true;

            if (!nonSpaceSeenSinceLastToken) ++tokenStartIndex;
            else ++spacesCountSinceLastNonSpace;

            continue;
        }


        if (char != ',') {
            spacesCountSinceLastNonSpace = 0;
            nonSpaceSeenSinceLastToken = true;
            continue;
        }


        nextIndex = i + 1;
        prevCharIsSpace = str[i - 1] == SPACE;
        nextCharIsSpace = str[nextIndex] == SPACE;

        if (isDirectlyFollowedOrPrecededBySpace() || isFirstCommaPrecededBySpaceAndFollowedByNonSpace()) {
            pushToken();
            tokenStartIndex = nextIndex;
            spaceSeenSinceLastToken = nonSpaceSeenSinceLastToken = false;
            spacesCountSinceLastNonSpace = 0;
        }
    }

    pushToken();

    return tokens;

    function isDirectlyFollowedOrPrecededBySpace() {
        return prevCharIsSpace || nextCharIsSpace;
    }

    function isFirstCommaPrecededBySpaceAndFollowedByNonSpace() {
        return spaceSeenSinceLastToken && !nextCharIsSpace;
    }

    function pushToken() {
        var token = str.slice(tokenStartIndex, i - spacesCountSinceLastNonSpace);
        token && tokens.push(token);
    }
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM