简体   繁体   中英

Perl Regex substitution issue

I'm trying to remove Tokens from a semicolon-separated string using regex. An example strings look as follows:

Field1=Blah;Field2=Bluh;Field3=Dingdong;Uid=John;Pwd=secret;Field4=lalali Field1=Blah;Field2=Bluh;Field3=Dingdong;Uid=John;Pwd=secret;Field4=lalali;

So I want to remove the "Uid" and "Pwd" tokens in separate commands, as to not remove any trailing tokens (eg Field4 should remain at the end).

My current attempt is to do:

$mystring =~s /Uid=.+;//i;

which yields

Field1=Blah;Field2=Bluh;Field3=Dingdong;Field4=lalali

Which works for the first line, but won't work for the 2nd line with the semicolon at the end, where it yields

Field1=Blah;Field2=Bluh;Field3=Dingdong;

and removes Field4 incorrectly. I tried a number of variations like

$mystring =~s /Uid=.+;?//i; $mystring =~s /Uid=.+;+?//i;

without success. I realize that I need to tell the Regex to only match up to the first semicolon, but I can't figure out how.

NOW, just so that I don't look completely stupid, I was able to get it to work by doing this:

$mystring =~s /Uid=[^;]+;//i;

but I'm still wondering why I can't tell the expression to only match up to the first semicolon ...

When you use a quantifier like + or * , then they are greedy. They gobble up as many characters as possible, and only give them back if they are forced through backtracking. The pattern .*; will therefore match everything until the last semicolon.

Maybe the greedy quantifiers should go on a diet. We can force them to by using lazy versions: +? and *? . These will terminate as early as possible. So the pattern would be:

/Uid=.+?;/  # repeat for Pwd

which matches until the first semicolon

This works, but it is considered good style to rather use a negated character class instead of non-greedy quantifiers with the . class:

/Uid=[^;]+;/

because there are less ways this can go wrong (like deleting the rest of the line). It is also more explicit than the other solution.

If you don't want to use the negated character class (which will work with most regex packages) you can use a non-greedy quantifier to match the data following the keyword (but it will only work with Perl compatible regex packages). See Quantifiers under Regular expressions for more information.

$mystring =~s /Uid=.+?;//i;

The extra question mark makes the + non-greedy; it takes the minimum string that will match instead of the maximum, so it won't match any semicolons.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM