How can I remove inner parentheses from an R string?

Question

I am processing strings in R which are supposed to contain zero or one pair of parentheses. If there are nested parentheses I need to delete the inner pair. Here is an example where I need to delete the parentheses around big bent nachos but not the other/outer parentheses.

test <- c(
  "Record ID", 
  "What is the best food? (choice=Nachos)", 
  "What is the best food? (choice=Tacos (big bent nachos))", 
  "What is the best food? (choice=Chips with stuff)", 
  "Complete?"
)

I know I can kill all the parentheses with the stringr package using str_remove_all() :

test |>
  stringr::str_remove_all(stringr::fixed(")")) |> 
  stringr::str_remove_all(stringr::fixed("("))

but I don't have the RegEx skills to pick the inner parentheses. I found a SO post that is close but it removes the outer parentheses and I cant untangle it to remove the inner.

Answer 1

Here you go.

test |>
  stringr::str_replace_all("(\\().*\\(", "\\1") |> # remove inner open brackets
  stringr::str_remove_all("\\)(?=.*\\))") # remove inner closed brackets

[1] "Record ID"                                       
[2] "What is the best food? (choice=Nachos)"          
[3] "What is the best food? (big bent nachos)"        
[4] "What is the best food? (choice=Chips with stuff)"
[5] "Complete?"

EDIT

Fixed my solution, so as to not lose text:

test |>
  stringr::str_replace("\\((.*)\\(", "(\\1") |> # remove inner open brackets
  stringr::str_remove_all("\\)(?=.*\\))") # remove inner outer brackets

[1] "Record ID"                                            
[2] "What is the best food? (choice=Nachos)"               
[3] "What is the best food? (choice=Tacos big bent nachos)"
[4] "What is the best food? (choice=Chips with stuff)"     
[5] "Complete?"

Answer 2

Interested in how this would be solved with multiple ( ... ) inside the outer parentheses, I came up with the following lookahead based idea. It only checks for an outer closing parentheses though.

test <- gsub("\\(([^)(]*)\\)(?=[^)(]*(?:\\([^)(]*\\)[^)(]*)*\\))", "\\1", test, perl=T)

See this R demo at tio.run or a pattern demo at regex101 (replace with \1 , capture of first group )

The lookahead verifies at each ( ... ) if only followed by ( .... ) or non -parentheses up to ) .

If there is even arbitrary nesting, flattening the first level could be solved by a recursive regex .

test <- gsub("(?:\\G(?!^)|\\()[^)(]*+\\K(\\(((?>[^)(]+|(?1))*)\\))", "\\2", test, perl=T)

One more R demo at tio.run or a regex101 demo (replace with \2 , the second group's capture)

regex-part	explained
`(?:\G(?!^)\|\()`	Matches an opening bracket for chaining matches to by use of `\G`
`[^)(]*+\K`	Consumes any amount of non -parentheses and `\K` resets the beginning
`(\(((?>[^)(]+\|(?1))*)\))`	Matching the nested parentheses ( explanation at php.net ↗ ). It contains two capture groups : • the first* recurses at `(?1)`* • the second* captures `(` inside `)` .*

Here the matches are chained to the opening parentheses. There is no check for an outer closing ) . This \G based idea can be used without recursion too for just one level but is slightly less efficient.

Answer 3

Assuming there be at most one nested parentheses, we could use a gsub() approach:

output <- gsub("\\(\\s*(.*?)\\s*\\(.*?\\)(.*?)\\s*\\)", "(\\1\\2)", test)
output

[1] "Record ID"                                       
[2] "What is the best food? (choice=Nachos)"          
[3] "What is the best food? (choice=Tacos)"           
[4] "What is the best food? (choice=Chips with stuff)"
[5] "Complete?"

Data:

test <- c(
  "Record ID", 
  "What is the best food? (choice=Nachos)", 
  "What is the best food? (choice=Tacos (big bent nachos))", 
  "What is the best food? (choice=Chips with stuff)", 
  "Complete?"
)

Answer 4

Here is a solution using gsub from base R. It is broken down into 2 steps for readability and debugging.

test <- c(
   "Record ID", 
   "What is the best food? (choice=Nachos)", 
   "What is the best food? (choice=Tacos (big bent nachos))", 
   "What is the best food? (choice=Chips with stuff)", 
   "Complete?"
) 

test <- gsub("(\\(.*)\\(", "\\1", test)
# ( \\(.*  ) - first group starts with '(' then zero or more characters following that first '('
#  \\(       - middle part look of a another '('

#  "\\1" replace the found group with the part from the first group

test <-gsub("\\)(.*\\))", "\\1", test)
#similer to first part
test

[1] "Record ID"                                            
[2] "What is the best food? (choice=Nachos)"               
[3] "What is the best food? (choice=Tacos big bent nachos)"
[4] "What is the best food? (choice=Chips with stuff)"     
[5] "Complete?"

How can I remove inner parentheses from an R string?

Question

4 answers

solution1
3 ACCPTED 2022-11-21 23:23:02

EDIT

solution2
2 2022-11-22 02:02:50

solution3
1 2022-11-21 23:20:13

solution4
1 2022-11-21 23:29:15

How can I remove inner parentheses from an R string?

Question

4 answers

solution1 3 ACCPTED 2022-11-21 23:23:02

EDIT

solution2 2 2022-11-22 02:02:50

solution3 1 2022-11-21 23:20:13

solution4 1 2022-11-21 23:29:15

solution1
3 ACCPTED 2022-11-21 23:23:02

solution2
2 2022-11-22 02:02:50

solution3
1 2022-11-21 23:20:13

solution4
1 2022-11-21 23:29:15