正则表达式：从多个匹配的括号中提取数字

Question

How do I match the year such that it is general for the following examples. 我如何匹配年份，以便它适用于以下示例。

a <- '"You Are There" (1953) {The Death of Socrates (399 B.C.) (#1.14)}'
b <- 'Þegar það gerist (1998/I) (TV)'

I have tried the following, but did not have the biggest success. 我尝试了以下，但没有取得最大的成功。

gsub('.+\\(([0-9]+.+\\)).?$', '\\1', a)

What I thought it did was to go until it finds a (, then it would make a group of numbers, then any character until it meets a ). 我认为它做的是直到它找到一个（然后它会产生一组数字，然后是任何字符，直到它遇到a）。 And if there are several matches, I want to extract the first group. 如果有几个匹配，我想提取第一组。

Any suggestions to where I go wrong? 对我出错的地方有什么建议吗？ I have been doing this in R. 我一直在做这个。

Answer 1

You could use 你可以用

library(stringr)

strings <- c('"You Are There" (1953) {The Death of Socrates (399 B.C.) (#1.14)}', 'Þegar það gerist (1998/I) (TV)')

years <- str_match(strings, "\\((\\d+(?: B\\.C\\.)?)")[,2]
years
# [1] "1953" "1998"

The expression here is 这里的表达是

\(               # (
(\d+             # capture 1+ digits
    (?: B\.C\.)? # B.C. eventually
)

Note that backslashes need to be escaped in R . 请注意，反斜杠需要在R进行转义。

Answer 2

Your pattern contains .+ parts that match 1 or more chars as many as possible, and at best your pattern could grab last 4 digit chunks from the incoming strings. 你的模式包含.+尽可能多地匹配1个或多个字符的部分，最多你的模式可以从传入的字符串中获取最后4位数字块。

You may use 你可以用

^.*?\((\d{4})(?:/[^)]*)?\).*

Replace with \\1 to only keep the 4 digit number. 替换为\\1仅保留4位数字。 See the regex demo . 请参阅正则表达式演示。

Details 细节

^ - start of string ^ - 字符串的开头
.*? - any 0+ chars as few as possible - 尽可能少的任何0+字符
\\( - a ( \\( - (
(\\d{4}) - Group 1: four digits (\\d{4}) - 第1组：四位数
(?: - start of an optional non-capturing group (?: - 可选的非捕获组的开始
- / - a / / - 一个/
- [^)]* - any 0+ chars other than ) [^)]* - 除了以外的任何0+字符)
)? - end of the group - 小组结束
\\) - a ) (OPTIONAL, MAY BE OMITTED) \\) - a ) （可选，可能省略）
.* - the rest of the string. .* - 字符串的其余部分。

See the R demo : 看R演示：

a <- c('"You Are There" (1953) {The Death of Socrates (399 B.C.) (#1.14)}', 'Þegar það gerist (1998/I) (TV)', 'Johannes Passion, BWV. 245 (1725 Version) (1996) (V)')
sub("^.*?\\((\\d{4})(?:/[^)]*)?\\).*", "\\1", a) 
# => [1] "1953" "1998" "1996"

Another base R solution is to match the 4 digits after ( : 另一个基本R解决方案是匹配4位数后( ：

regmatches(a, regexpr("\\(\\K\\d{4}(?=(?:/[^)]*)?\\))", a, perl=TRUE))
# => [1] "1953" "1998" "1996"

The \\(\\K\\d{4} pattern matches ( and then drops it due to \\K match reset operator and then a (?=(?:/[^)]*)?\\\\)) lookahead ensures there is an optional / + 0+ chars other than ) and then a ) . \\(\\K\\d{4}模式匹配(然后由于\\K匹配重置运算符而丢弃它，然后是(?=(?:/[^)]*)?\\\\))预测确保存在可选/ + 0+字符以外) ，然后一个) 。 Note that regexpr extracts the first match only. 请注意， regexpr仅提取第一个匹配项。

正则表达式：从多个匹配的括号中提取数字

问题描述

2 个解决方案

解决方案1
3 2017-10-18 19:51:15

解决方案2
2 已采纳 2017-10-18 19:49:21

正则表达式：从多个匹配的括号中提取数字

问题描述

2 个解决方案

解决方案1 3 2017-10-18 19:51:15

解决方案2 2 已采纳 2017-10-18 19:49:21

解决方案1
3 2017-10-18 19:51:15

解决方案2
2 已采纳 2017-10-18 19:49:21