简体   繁体   中英

Removing text between parentheses with unmatched pairs

I am trying to remove characters/numbers between parentheses. Firstly, the numbered parentheses - ie ("(3)") - at the start, and then anything in the second pair of parentheses. Sometimes this second pair of parentheses has an unmatched bracket which complicates things. An example:

library(qdapRegex)
n <- c("(1) Apple (Pe(ar)", "(2) Apple (Or(ang)e)", "(3) Banana (Hot(dog)")
c <- rm_between(n,"(",")", extract = TRUE)

To ideally get:

c
> "Apple" "Apple" "Banana" 

It seems that you always need the second word. If that is the case then here are a couple of (straightforward) ways of doing it,

#Base R
sapply(strsplit(n, ' '), `[`, 2)
[1] "Apple"  "Apple"  "Banana"

#The always fun, word() from stringr package
stringr::word(n, 2)
[1] "Apple"  "Apple"  "Banana"

If you want to use regex, then you could use a replace regex with empty string like this:

[^A-Za-z ]

Or with insensitive flag

(?i)[^a-z ]

Regex demo

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM