简体   繁体   中英

How can I remove inner parentheses from an R string?

I am processing strings in R which are supposed to contain zero or one pair of parentheses. If there are nested parentheses I need to delete the inner pair. Here is an example where I need to delete the parentheses around big bent nachos but not the other/outer parentheses.

test <- c(
  "Record ID", 
  "What is the best food? (choice=Nachos)", 
  "What is the best food? (choice=Tacos (big bent nachos))", 
  "What is the best food? (choice=Chips with stuff)", 
  "Complete?"
) 

I know I can kill all the parentheses with the stringr package using str_remove_all() :

test |>
  stringr::str_remove_all(stringr::fixed(")")) |> 
  stringr::str_remove_all(stringr::fixed("("))

but I don't have the RegEx skills to pick the inner parentheses. I found a SO post that is close but it removes the outer parentheses and I cant untangle it to remove the inner.

Here you go.

test |>
  stringr::str_replace_all("(\\().*\\(", "\\1") |> # remove inner open brackets
  stringr::str_remove_all("\\)(?=.*\\))") # remove inner closed brackets
[1] "Record ID"                                       
[2] "What is the best food? (choice=Nachos)"          
[3] "What is the best food? (big bent nachos)"        
[4] "What is the best food? (choice=Chips with stuff)"
[5] "Complete?"

EDIT

Fixed my solution, so as to not lose text:

test |>
  stringr::str_replace("\\((.*)\\(", "(\\1") |> # remove inner open brackets
  stringr::str_remove_all("\\)(?=.*\\))") # remove inner outer brackets
[1] "Record ID"                                            
[2] "What is the best food? (choice=Nachos)"               
[3] "What is the best food? (choice=Tacos big bent nachos)"
[4] "What is the best food? (choice=Chips with stuff)"     
[5] "Complete?" 

Interested in how this would be solved with multiple ( ... ) inside the outer parentheses, I came up with the following lookahead based idea. It only checks for an outer closing parentheses though.

test <- gsub("\\(([^)(]*)\\)(?=[^)(]*(?:\\([^)(]*\\)[^)(]*)*\\))", "\\1", test, perl=T)

See this R demo at tio.run or a pattern demo at regex101 (replace with \1 , capture of first group )

The lookahead verifies at each ( ... ) if only followed by ( .... ) or non -parentheses up to ) .


If there is even arbitrary nesting, flattening the first level could be solved by a recursive regex .

test <- gsub("(?:\\G(?!^)|\\()[^)(]*+\\K(\\(((?>[^)(]+|(?1))*)\\))", "\\2", test, perl=T)

One more R demo at tio.run or a regex101 demo (replace with \2 , the second group's capture)

regex-part explained
(?:\G(?!^)|\() Matches an opening bracket for chaining matches to by use of \G
[^)(]*+\K Consumes any amount of non -parentheses and \K resets the beginning
(\(((?>[^)(]+|(?1))*)\)) Matching the nested parentheses ( explanation at php.net ↗ ).
It contains two capture groups :
• the first recurses at (?1)
• the second captures ( inside ) .

Here the matches are chained to the opening parentheses. There is no check for an outer closing ) . This \G based idea can be used without recursion too for just one level but is slightly less efficient.

Assuming there be at most one nested parentheses, we could use a gsub() approach:

output <- gsub("\\(\\s*(.*?)\\s*\\(.*?\\)(.*?)\\s*\\)", "(\\1\\2)", test)
output

[1] "Record ID"                                       
[2] "What is the best food? (choice=Nachos)"          
[3] "What is the best food? (choice=Tacos)"           
[4] "What is the best food? (choice=Chips with stuff)"
[5] "Complete?"

Data:

test <- c(
  "Record ID", 
  "What is the best food? (choice=Nachos)", 
  "What is the best food? (choice=Tacos (big bent nachos))", 
  "What is the best food? (choice=Chips with stuff)", 
  "Complete?"
)

Here is a solution using gsub from base R. It is broken down into 2 steps for readability and debugging.

test <- c(
   "Record ID", 
   "What is the best food? (choice=Nachos)", 
   "What is the best food? (choice=Tacos (big bent nachos))", 
   "What is the best food? (choice=Chips with stuff)", 
   "Complete?"
) 

test <- gsub("(\\(.*)\\(", "\\1", test)
# ( \\(.*  ) - first group starts with '(' then zero or more characters following that first '('
#  \\(       - middle part look of a another '('

#  "\\1" replace the found group with the part from the first group

test <-gsub("\\)(.*\\))", "\\1", test)
#similer to first part
test

[1] "Record ID"                                            
[2] "What is the best food? (choice=Nachos)"               
[3] "What is the best food? (choice=Tacos big bent nachos)"
[4] "What is the best food? (choice=Chips with stuff)"     
[5] "Complete?"  

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM