简体   繁体   中英

Javascript RegEx non-capturing prefix

I am trying to do some string replacement with RegEx in Javascript. The scenario is a single line string containing long comma-delimited list of numbers, in which duplicates are possible.

An example string is: 272,2725,2726,272,2727,297,272 (The end may or may not end in a comma)

In this example, I am trying to match each occurrence of the whole number 272. (3 matches expected) The example regex I'm trying to use is: (?:^|,)272(?=$|,)

The problem I am having is that the second and third matches are including the leading comma, which I do not want. I am confused because I thought (?:^|,) would match, but not capture. Can someone shed light on this for me? An interesting bit is that the trailing comma is excluded from the result, which is what I want.

For what it is worth, if I were using C# there is syntax for prefix matching that does what I want: (?<=^|,) However, it appears to be unsupported in JavaScript.

Lastly, I know I could workaround it using string splitting, array manipulation and rejoining, but I want to learn.

Use word boundaries instead:

\b272\b

ensures that only 272 matches, but not 2725 .

(?:...) matches and doesn't capture - but whatever it matches will be part of the overall match.

A lookaround assertion like (?=...) is different: It only checks if it is possible (or impossible) to match the enclosed regex at the current point, but it doesn't add to the overall match.

Here is a way to create a JavaScript look behind that has worked in all cases I needed.

This is an example. One can do many more complex and flexible things.

The main point here is that in some cases, it is possible to create a RegExp non-capturing prefix (look behind) construct in JavaScript.

This example is designed to extract all fields that are surrounded by braces '{...}'. The braces are not returned with the field.

This is just an example to show the idea at work not necessarily a prelude to an application.

    function testGetSingleRepeatedCharacterInBraces()
      {
        var leadingHtmlSpaces = '&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;' ;
        // The '(?:\b|\B(?={))' acts as a prefix non-capturing group.
        // That is, this works (?:\b|\B(?=WhateverYouLike))
        var regex  = /(?:\b|\B(?={))(([0-9a-zA-Z_])\2{4})(?=})/g ;
        var string = '' ;

        string = 'Message has no fields' ;
        document.write( 'String => "' + string 
                                      + '"<br>'  + leadingHtmlSpaces + 'fields => '
                                      + getMatchingFields( string, regex )
                                      + '<br>' ) ;

        string = '{LLLLL}Message {11111}{22222} {ffffff}abc def{EEEEE} {_____} {4444} {666666} {55555}' ;
        document.write( 'String => "' + string
                                      + '"<br>'  + leadingHtmlSpaces + 'fields => '
                                      + getMatchingFields( string, regex )
                                      + '<br>' ) ;
      } ;

    function getMatchingFields( stringToSearch, regex )
      {
         var matches = stringToSearch.match( regex ) ;
         return matches ? matches : [] ;
      } ;

    Output:
    String => "Message has no fields"
         fields =>
    String => "{LLLLL}Message {11111}{22222} {ffffff}abc def{EEEEE} {_____} {4444} {666666} {55555}"
         fields => LLLLL,11111,22222,EEEEE,_____,55555

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM