連續3個正則表達式，如果有的話

Question

我正在尋找正則表達式，如果有任何連續提取3個單詞。 例如，如果我有2個字符串：

"1. Stack is great and awesome"
"2. Stack"

結果是：

"Stack is great"
"Stack"

這個答案對我不起作用：正則表達式：匹配3個連續的單詞

我的努力：

(?:[A-ZŠČĆŽa-zščćž]+ )(?:[A-ZŠČĆŽa-zščćž]+ )(?:[A-ZŠČĆŽa-zščćž]+ )

Answer 1

你可以用

> x <- c("1. Stack is great and awesome", "2. Stack")
> regmatches(x, regexpr("[A-Za-z]+(?:\\s+[A-Za-z]+){0,2}", x))
[1] "Stack is great" "Stack"
## Or to support all Unicode letters
> y <- c("1. Stąck is great and awesome", "2. Stack")
> regmatches(y, regexpr("\\p{L}+(?:\\s+\\p{L}+){0,2}", y, perl=TRUE))
[1] "Stąck is great" "Stack"
## In some R environments, it makes sense to use another, TRE, regex:
> regmatches(y, regexpr("[[:alpha:]]+(?:[[:space:]]+[[:alpha:]]+){0,2}", x))
[1] "Stąck is great" "Stack"

查看正則表達式演示和在線R演示以及替代正則表達式演示。

請注意，正則表達式將從任何字符串中提取第一，第2或第3個字母單詞。 如果您需要至少2個單詞，請將{0,2}限制量詞替換為{1,2} 。

要提取多個匹配項，請使用gregexpr而不是regexpr 。

圖案細節

\\\\p{L}+ / [A-Za-z] - 任何1+ Unicode（如果使用[A-Za-z]則為ASCII）字母
(?:\\\\s+\\\\p{L}+){0,2} / (?:\\\\s+[a-zA-Z]+){0,2} - 0,1或2次連續出現：
- \\\\s+ - 1+空格
- \\\\p{L}+ / [A-Za-z] - 任何1+ Unicode（如果使用[A-Za-z]則為ASCII）字母

注意使用perl=TRUE參數和使用\\p{L}構造的正則表達式。 如果它不起作用，請嘗試在模式的最開頭添加(*UCP) PCRE動詞，使所有通用/ Unicode /速記類真正識別Unicode。

請注意，所有這些正則stringr::str_extract都適用於stringr::str_extract和stringr::str_extract_all ：

> str_extract(x, "\\p{L}+(?:\\s+\\p{L}+){0,2}")
[1] "Stack is great" "Stack"         
> str_extract(x, "[a-zA-Z]+(?:\\s+[a-zA-Z]+){0,2}")
[1] "Stack is great" "Stack"         
> str_extract(x, "[[:alpha:]]+(?:\\s+[[:alpha:]]+){0,2}")
[1] "Stack is great" "Stack"

這里不支持(*UCP) ，因為stringr函數是ICU正則表達式，而不是PCRE。 Unicode測試：

> str_extract(y, "\\p{L}+(?:\\s+\\p{L}+){0,2}")
[1] "Stąck iç great" "Stack"         
> str_extract(y, "[[:alpha:]]+(?:\\s+[[:alpha:]]+){0,2}")
[1] "Stąck iç great" "Stack"

連續3個正則表達式，如果有的話

問題描述

1 個解決方案

解決方案1
3 已采納 2018-07-18 15:41:26

連續3個正則表達式，如果有的話

問題描述

1 個解決方案

解決方案1 3 已采納 2018-07-18 15:41:26

解決方案1
3 已采納 2018-07-18 15:41:26