简体   繁体   中英

Regex for replacing commas only within square brackets

I have a text file and it contains comma separated strings. But some of the strings separated by commas are of the form [*,*,*,...] . So for example:

"Hello", "Goodbye", ["Yes", "No", "Maybe], "Indeed", ["Why", "What"]

I want to be able to parse the file to replace only commas within square brackets with a semicolon. There can be any number of brackets and any number of commas within the brackets.

I tried using this code in R but its not working as planned, some commas outside my brackets are being replaced:

repeat{
          tmp <- gsub("(\\[.*\\K),(?=.*\\])", ";", tmp, perl = TRUE) # replace last comma found within braces with semicolon
          if (sum(grepl("(\\[.*\\K),(?=.*\\])", tmp, perl = TRUE)) == 0) {  # repeat until no more commas found
            break
          }
        }

Can anyone help with regex that can solve this problem? Thanks!

To replace all commas inside square brackets with semi-colons, you may use

gsub("(?:\\G(?!^)|\\[)[^][,]*\\K,", ";", x, perl=TRUE)

See the regex demo . The regex above does not check for the closing ] though. If it is required, use

gsub("(?:\\G(?!^)|\\[)[^][,]*\\K,(?=[^][]*])", ";", x, perl=TRUE)

See another regex demo

Details

  • (?:\\G(?!^)|\\[) - end of the previous match ( \\G(?!^) ) or ( | ) a [ ( \\[ )
  • [^][,]* - 0+ chars other than [ and ] and a ]
  • \\K - match reset operator that discards all the text matched so far
  • , - a comma
  • (?=[^][]*]) - a positive lookahead that requires 0+ chars other than [ and ] and a ] immediately to the right of the current location.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM