简体   繁体   中英

Javascript Regexp and “string literal”

I'm making a JS "command line" emulator.

I have Regexp: /([^\\s"]+)|"([^\\s"]+)"/g . I want to match single words, like echo , wyświetl , jd923h90asd8 . Also, I want to match "string literals" - something like "this is a string" or "f82h3 23fhn aj293 dgja3 xcn32" .

I'm using match method on input string to get array of all matches. But problem is: when Regexp matches "string literal" and returns string to array, this string INCLUDES double-quotes. I don't want double-quotes, but the question is - why Regexp includes double-quotes? In the Regexp, quotes "" are excluded from () group. Why Regexp includes it all?

EDIT:

var re = /([^\s"]+)|"([^\s"]+)"/g;

var process = function (text) {
    return execute(text.match(re));
}

var execute = function (arr) {
    console.log(arr);
    try {
        //... apply a function with arguments...
    } catch (e) {
        error(arr[0]+": wrong function");
        return "";
    }
}

For input echo abc "abc def" "ghi" Regexp returns array ["echo", "abc", "abc", "def", ""ghi""] . I want to make a Regexp, that from that input will return ["echo", "abc", "abc def", "ghi"] .

正则表达式的加引号部分( "([^\\s"]+)" )不允许在引号内加空格。尝试从中删除\\s 。如果需要匹配空,也可以考虑使用*而不是+字符串( "" ):

/([^\s"]+)|"([^"]*)"/g 

This is the only possible explanation. Even without looking at any code.

Use group(1) or group(2) . Not group() or group(0) . The later 2 (which are fully equivalent) always return the whole matched string, which in your case includes the quotes. I hope this explains what's going on.

PS: As your RegEx is an "or" RegEx, group(1) and group(2) will never have both content at the same time. One, the other, or both will be null or empty. The later when there is no match.

I just realized your are using the match method to retrieve all matches as an array. In this case, let me say that this method always captures the whole matched strings in each case (the equivalent to group(0) above). There is no way of telling it to retrieve other groups (like 1 or 2). In consequence, you have 3 alternatives:

  1. Remove the " s from strings with them in the resulting array through some "post-processing".
  2. Do not use JavaScript's match method, but create your own equivalent (and use group(1) or group(2) according to the case in it).
  3. Change your regular expression to match the quotes as zero-width positive lookaheads and lookbehinds. Not sure if JavaScript supports this, but it should be /([^\\s"]+)|(?<=")([^\\s"]+)(?=")/g

To match JavaScript String literals. Here's what you're looking for:

/(\\w+|("|')(.*?)\\2)/g

To explain this: you're either looking for unquoted word characters OR matching quotes with anything in between (eg quotes should match correctly, for example: "it's his dog" using regex backreference).

This is simplified to be wary that it does not match escaped a string like:

"my \\"complex\\" string"

It didn't look like you were worried about that last scenario.

http://regexr.com/3bdbi

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM