提取带后缀的括号之间的文本

Question

Here is the exmple t <- 'Hui Wan (Shanghai Maritime University); Mingqiang Xu (Shanghai Chart Center, Donghai Navigation Safety Administration of MOT)*; Yingjie Xiao ( Shanghai Maritime University)'这里是例子t <- 'Hui Wan (Shanghai Maritime University); Mingqiang Xu (Shanghai Chart Center, Donghai Navigation Safety Administration of MOT)*; Yingjie Xiao ( Shanghai Maritime University)' t <- 'Hui Wan (Shanghai Maritime University); Mingqiang Xu (Shanghai Chart Center, Donghai Navigation Safety Administration of MOT)*; Yingjie Xiao ( Shanghai Maritime University)' t <- 'Hui Wan (Shanghai Maritime University); Mingqiang Xu (Shanghai Chart Center, Donghai Navigation Safety Administration of MOT)*; Yingjie Xiao ( Shanghai Maritime University)' I want the output is the information in '()*' In this exmple is Shanghai Chart Center, Donghai Navigation Safety Administration of MOT Yingjie t <- 'Hui Wan (Shanghai Maritime University); Mingqiang Xu (Shanghai Chart Center, Donghai Navigation Safety Administration of MOT)*; Yingjie Xiao ( Shanghai Maritime University)'我要 output 是'()*'中的信息本例是交通运输部Shanghai Chart Center, Donghai Navigation Safety Administration of MOT

Answer 1

To match only the contents of (…)* , the tricky part is to avoid matching two unrelated parenthetical groups (ie something like (…) … (…)* ).要仅匹配(…)*的内容，棘手的部分是避免匹配两个不相关的括号组（即类似(…) … (…)* ）。 The easiest way to accomplish this is to disallow closing parentheses inside the match:实现这一点的最简单方法是在匹配项中禁止右括号：

stringr::str_match_all(t, r'{\(([^)]*)\)\*}')

Do note that this will fail for nested parentheses ( ( … ( … ) …)* ).请注意，对于嵌套括号（ ( … ( … ) …)* ），这将失败。 Regular expressions are fundamentally unsuited to parse nested content so if you require handling such a case, regular expressions are not the appropriate tool;正则表达式根本不适合解析嵌套内容，因此如果您需要处理这种情况，正则表达式不是合适的工具； you'll need to use a context-free parser (which is a lot more complicated).您需要使用上下文无关的解析器（这要复杂得多）。

Answer 2

The key here is to use the non-greedy wildcard .*?这里的关键是使用非贪婪通配符.*? , otherwise everything between the first ( and the last ) would be caught: ，否则第一个(和最后一个)之间的所有内容都会被捕获：

library(stringr)
t <- 'Hui Wan (Shanghai Maritime University); Mingqiang Xu (Shanghai Chart Center, Donghai Navigation Safety Administration of MOT)*; Yingjie Xiao ( Shanghai Maritime University)'
str_extract_all(t, "(\\(.*?\\)\\*?)")[[1]] %>% str_subset("\\*$")
#> [1] "(Shanghai Chart Center, Donghai Navigation Safety Administration of MOT)*"

^{Created on 2021-03-03 by the reprex package (v1.0.0)}^{由reprex package (v1.0.0) 于 2021 年 3 月 3 日创建}

You can use the rev() function if you want to reverse the order and get it right to left.如果您想颠倒顺序并从右到左，您可以使用rev() function。

This is far less elegant than I would like it but unexpectedly "(\$.*?\$\\*)" is not non-greedy, so I had to detect it at the end of the string.这远没有我想要的优雅，但出乎意料的是"(\$.*?\$\\*)"不是非贪婪的，所以我不得不在字符串的末尾检测到它。 You can add %>% str_remove_all("\\*$") if you want to discard the star in the end string.如果要丢弃末尾字符串中的星号，可以添加%>% str_remove_all("\\*$") 。

Answer 3

Define a pattern that starts with ( , is followed by any characters except ( or ) (expressed as a negative character class [^)(]+ ) and closed by )* :定义一个以(开头的模式，后跟除(或)以外的任何字符（表示为负字符 class [^)(]+ ）并以)*结束：

library(stringr)
str_extract_all(t, "\\([^)(]+\\)\\*")
[[1]]
[1] "(Shanghai Chart Center, Donghai Navigation Safety Administration of MOT)*"

You can get rid of the list structure with unlist()您可以使用unlist()摆脱列表结构

提取带后缀的括号之间的文本

问题描述

3 个解决方案

解决方案1
2 2021-03-03 11:49:18

解决方案2
1 已采纳 2021-03-03 11:10:42

解决方案3
0 2021-03-03 11:49:35

提取带后缀的括号之间的文本

问题描述

3 个解决方案

解决方案1 2 2021-03-03 11:49:18

解决方案2 1 已采纳 2021-03-03 11:10:42

解决方案3 0 2021-03-03 11:49:35

解决方案1
2 2021-03-03 11:49:18

解决方案2
1 已采纳 2021-03-03 11:10:42

解决方案3
0 2021-03-03 11:49:35