Extract text between parentheses with suffix

Question

Here is the exmple t <- 'Hui Wan (Shanghai Maritime University); Mingqiang Xu (Shanghai Chart Center, Donghai Navigation Safety Administration of MOT)*; Yingjie Xiao ( Shanghai Maritime University)' t <- 'Hui Wan (Shanghai Maritime University); Mingqiang Xu (Shanghai Chart Center, Donghai Navigation Safety Administration of MOT)*; Yingjie Xiao ( Shanghai Maritime University)' t <- 'Hui Wan (Shanghai Maritime University); Mingqiang Xu (Shanghai Chart Center, Donghai Navigation Safety Administration of MOT)*; Yingjie Xiao ( Shanghai Maritime University)' I want the output is the information in '()*' In this exmple is Shanghai Chart Center, Donghai Navigation Safety Administration of MOT

Answer 1

To match only the contents of (…)* , the tricky part is to avoid matching two unrelated parenthetical groups (ie something like (…) … (…)* ). The easiest way to accomplish this is to disallow closing parentheses inside the match:

stringr::str_match_all(t, r'{\(([^)]*)\)\*}')

Do note that this will fail for nested parentheses ( ( … ( … ) …)* ). Regular expressions are fundamentally unsuited to parse nested content so if you require handling such a case, regular expressions are not the appropriate tool; you'll need to use a context-free parser (which is a lot more complicated).

Answer 2

The key here is to use the non-greedy wildcard .*? , otherwise everything between the first ( and the last ) would be caught:

library(stringr)
t <- 'Hui Wan (Shanghai Maritime University); Mingqiang Xu (Shanghai Chart Center, Donghai Navigation Safety Administration of MOT)*; Yingjie Xiao ( Shanghai Maritime University)'
str_extract_all(t, "(\\(.*?\\)\\*?)")[[1]] %>% str_subset("\\*$")
#> [1] "(Shanghai Chart Center, Donghai Navigation Safety Administration of MOT)*"

^{Created on 2021-03-03 by the reprex package (v1.0.0)}

You can use the rev() function if you want to reverse the order and get it right to left.

This is far less elegant than I would like it but unexpectedly "(\$.*?\$\\*)" is not non-greedy, so I had to detect it at the end of the string. You can add %>% str_remove_all("\\*$") if you want to discard the star in the end string.

Answer 3

Define a pattern that starts with ( , is followed by any characters except ( or ) (expressed as a negative character class [^)(]+ ) and closed by )* :

library(stringr)
str_extract_all(t, "\\([^)(]+\\)\\*")
[[1]]
[1] "(Shanghai Chart Center, Donghai Navigation Safety Administration of MOT)*"

You can get rid of the list structure with unlist()

Extract text between parentheses with suffix

Question

3 answers

solution1
2 2021-03-03 11:49:18

solution2
1 ACCPTED 2021-03-03 11:10:42

solution3
0 2021-03-03 11:49:35

Extract text between parentheses with suffix

Question

3 answers

solution1 2 2021-03-03 11:49:18

solution2 1 ACCPTED 2021-03-03 11:10:42

solution3 0 2021-03-03 11:49:35

solution1
2 2021-03-03 11:49:18

solution2
1 ACCPTED 2021-03-03 11:10:42

solution3
0 2021-03-03 11:49:35