简体   繁体   中英

$0 and $1 in Regular Expression

I am trying to use replace with regular expression and need some help understanding what $1 means in this code. This code is to swap the case.

return str.replace(/([a-z])|([A-Z])/g,
    function($0, $1)
    { return ($1) ? $0.toUpperCase() : $0.toLowerCase(); })

I understand in the first parameter of replace method I check if we have lowercase or uppercase alphabet, but I do not understand how and what it does to have the second parameter.

I understand the syntax in which if ($1) is true we execute $0.toUpperCase(), if not we do $0.toLowerCase(). But how to decide from what if ($1) is true? What condition does ($1) have? I think I understand $0 is for the entire matched string. But I am confused with ($1). Thanks!

What condition does ($1) have?

None -- regular expression syntax doesn't specify how match groups are to be evaluated for truthiness, so this depends on the behavior of the containing language.

I'm guessing that, in this case, it's evaluating empty strings as false, and non-empty strings as true -- but as you haven't told us what language this is, one can't tell for certain.

$0 is indeed the entire matched string. $1 is the first subpattern (ie. the lowercase letter). If the first subpattern matched, we uppercase it, otherwise we lowercase it. It should be noted that the function is given another argument, but it is not used in this case.

Note the parameters passed to the evaluator . Going with the pattern given, I've rewritten it as:

str.replace(/([a-z])|([A-Z])/g,
  function(match, p1, p2) {
    return p1 ? p1.toUpperCase() : p2.toLowerCase();
  })

If ([az]) matches then the bound p1 ( $1 ) variable will evaluate to a truthy-string (any string that is not empty; in particular, one that was accepted by the regular expression); otherwise p1 will be the empty string "" (which is a falsy-value). This is why the check on p1 ( $1 ) is correct - note that the bound capture groups always have the type of string.

Note that thee is no point to check on match ( $0 ) as it will never be anything-but-truthy with the regular expression (it will be either the first or second alternation subexpression).

$0 and $1 in this example are simply variables (named using Hungarian Notation). You can easily replace them with a and b and the code works just the same. The function is called with many parameters. The first parameter is the matched substring. The second parameter is the first capture (anything that matched the regex in the first parentheses, in this case a single lowercase letter az). The third (missing) parameter would be the second capture (matching uppercase letter AZ), but note this is ignored.

Because the regex using the global "g" flag, the function is called (potentially) many times. The regex matches any lower or uppercase letter. The first function parameter will match each character in succession from str, and the second parameter will be set only if the first capture group was matched - meaning the second function parameter is set only if the character is lowercase . If the second parameter was set (ie this is a lowercase character), then the toUpperCase function is called on the match. If the second parameter is unset (ie this is an uppercase character), then the toLowerCase function is called on the match. In this latter case the unused third parameter would have contained the second capture group contents.

The entire solution has the effect of swapping cases on a character-by-character basis.

But.. the use of $0 and $1 in this code suggest the creator was toying with something else: references and back-references. $0 (or \\0 in some languages) is a reference to the match (ie it is precisely the first argument to the function), $1 (or \\1) is a reference to the first capture group, etc. The author of this code named the variables as such to support the notion that the first argument is equivalent to $0 and the second argument to $1, etc. This is, in my opinion, enlightened and entirely confusing. The use of these variable names suggests something magic is happening. The use of Hungarian Notation also implies something else. There is nothing magic here, and as such the variables should be more simply named - match and is_lowercase would have been fine.

Furthermore, since the third function argument is missing, there is no need for the uppercase character class to be captured.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM