简体   繁体   中英

regex to replace all tokens not in quotes?

I am trying to process some input data in JavaScript whereby I need to replace the occurrences of all string tokens (in the form "ID1", "ID2", "ID3", ...) with a string that wraps the original token. For example "ID1" becomes "table['ID1']". However if the original token is wrapped in quotes (single or double) it must be ignored.

For example the input string:

var input = "ID10 \"ID0\" FOO 'ID0' #ID0# ID10 BAR ID1 ID0.";

should become:

"table['ID10'] \"ID0\" FOO 'ID0' #table['ID0']# table['ID10'] BAR table['ID1'] table['ID0']."

I can currently get some of the way using the following code ( Try it on jsbin.com here ):

var input = "ID10 \"ID0\" FOO 'ID0' #ID0# ID10 BAR ID1 ID0.";

var expected = "table['ID10'] \"ID0\" FOO 'ID0' #table['ID0']# table['ID10'] BAR table['ID1'] table['ID0'].";

// assume 15 is the max number of ids. we search backwards.
for( i=15 ; i>=0 ; i-- )
{
    var id = "ID" + i;

    var regex = new RegExp( "[^\"\']" + id + "", 'g' );

    input = input.replace( regex, "table['" + id + "']" );
}

if( input == expected )
    alert( 'success :)' );

This produces the output:

ID10 "ID0" FOO 'ID0' table['ID0']#table['ID10'] BARtable['ID1']table['ID0'].

It seems close to working, however the first id (ID10) gets ignored and the first character before a match gets lost.

Can anybody please advise how to process this correctly, thanks.

I think you're going to need a negative lookahead token.

Take a look here

The whole regex is

(ID\d+(?!\\))

The negative lookahead is the (?!...) . It just asserts that the next character after the digits is not a backslash

So the code would be something along the lines of

var re = /(ID\d+(?!\\))/g; 
var str = 'ID10 \"ID0\" FOO \'ID0\' #ID0# ID10 BAR ID1 ID0.';
var subst = 'table[\'$1\']'; 
var result = str.replace(re, subst);
// table['ID10'] \"ID0\" FOO 'table['ID0']' #table['ID0']# table['ID10'] BAR table['ID1'] table['ID0'].

You can use this regex based on alternation in String#replace with a callback function:

var input = "ID10 \"ID0\" FOO 'ID0' #ID0# ID10 BAR ID1 ID0.";
var r= input.replace(/"[^"]*"|'[^']*'|(ID\d+)/g, function($0, $1) {
       return ($1)? "table['"+$1+"']" : $0;});
//=> table['ID10'] "ID0" FOO 'ID0' #table['ID0']# table['ID10'] BAR table['ID1'] table['ID0'].

Edit it seems that zero-width negative look-behind is no supported in Javascript, so you need a zero-width negative look-ahead to check the next character after the ID plus digits is not either backslash, single or double quotes.

you could try

/(ID\d+(?![\\\'\"]))/g

EDIT Forget all this!

You need a zero-width negative look-behind

you could try

 /(?<![\\"\\'])ID\\d+/g 

alternatively, you might try to capture your match in a group

 /[^\\"\\'](ID\\d+)/g 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM