[英]R: regex from first character to the end of the string
I have strings like these here: 我在这里有这样的字符串:
a <- "-en eller -et eller (uofficielt) -'en eller (uofficielt) -'et"
b <- "-ten, -ter, -terne"
And I would like to use regular expressions in R to extract the text from the "-" to the first non-character, thus get: 而且我想在R中使用正则表达式将文本从“-”提取到第一个非字符,从而得到:
en et 'en 'et
ten ter terne
I have found a solution, but it just does not feel very satisfying or elegant 我找到了一个解决方案,但感觉并不令人满意或优雅
a <- unlist(strsplit(a, " |,"))
a <- a[grep("-", a)]
a <- gsub("-", "", a)
b <- unlist(strsplit(b, " |,"))
b <- b[grep("-", b)]
b <- gsub("-", "", b)
Do you have a suggesting for a more elegant one-liner that extracts all the endings I want? 您是否有建议提出一种更优雅的单线提取我想要的所有结局?
I think you need to match a -
that is not preceded with a word char (that is, not match when it is part of a compound word), and there is an optional '
after the hyphen, that is then followed with 1+ word chars. 我认为您需要匹配一个-
,但不带单词char(也就是说,当它是复合单词的一部分时不匹配),并且在连字符后有一个可选的'
,然后是1+单词字符 Then, you can use 然后,您可以使用
a <- "-en eller -et eller (uofficielt) -'en eller (uofficielt) -'et"
b <- "-ten, -ter, -terne"
pat <- "\\B-\\K'?\\w+"
res_a <- regmatches(a, gregexpr(pat, a, perl=TRUE))
unlist(res_a)
## [1] "en" "et" "'en" "'et"
res_b <- regmatches(b, gregexpr(pat, b, perl=TRUE))
unlist(res_b)
## [1] "ten" "ter" "terne"
See the online R demo 观看在线R演示
Pattern details : 图案细节 :
\\\\B
- a non-word boundary \\\\B
非单词边界 -
- a hyphen -
连字符 \\\\K
- match reset operator \\\\K
匹配重置运算符 '?
- an optional '
-可选的'
\\\\w+
- 1 or more letters, digits or _
\\\\w+
-1个或多个字母,数字或_
We can use str_extract
我们可以使用str_extract
library(stringr)
str_extract_all(a, '(?<=-)[^, ]+')[[1]]
#[1] "en" "et" "'en" "'et"
str_extract_all(b, '(?<=-)[^, ]+')[[1]]
#[1] "ten" "ter" "terne"
If you want to keep it in base R, I do not not think you will get it much more elegant that what you have (and you can always make that a one-liner). 如果您想将其保留在基数R中,我不认为您会比拥有的东西更加优雅(并且您可以始终使它成为单线)。 The value
argument of grep might help you a bit as below. grep的value
参数可能对您有所帮助,如下所示。
Maybe 也许
substring(grep("-'?\\w", strsplit(a, " ")[[1]], value = TRUE), 2)
or 要么
gsub("-", "", grep("-'?\\w", strsplit(a, " ")[[1]], value = TRUE)
can be considered slightly more elegant. 可以算是稍微优雅一点。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.