[英]how do you extract values between two characters in R?
I am trying to extract the server name (server101) from this string in R using regular expression:我正在尝试使用正则表达式从 R 中的这个字符串中提取服务器名称(server101):
value between @ and the following first period (.) @和下一个句点 (.)之间的值
t<-c("Current CPU load - jvm machine[example network-app_svc_group_mem4]@server101.example.com")
I've tried this:我试过这个:
gsub('.*\\@(\\d+),(\\d+).*', '\\1', t)
this does not seem to be working, any ideas?这似乎不起作用,有什么想法吗?
Since you only expect one match, you may use a simple sub
here:由于您只期待一场比赛,您可以在这里使用一个简单的
sub
:
t <- "Current CPU load - jvm machine[example network-app_svc_group_mem4]@server101.example.com"
sub(".*@([^.]+)\\..*", "\\1", t)
## => [1] "server101"
See the R demo online .在线查看R 演示。
Details细节
.*
- any 0+ chars, as many as possible .*
- 任何 0+ 个字符,尽可能多@
- a @
char @
- 一个@
字符([^.]+)
- Group 1 ( "\\\\1"
): ([^.]+)
- 第 1 组( "\\\\1"
):\\\\.
- a dot (other chars you need to escape are $
, ^
, *
, (
, )
, +
, [
, \\
, ?
) $
, ^
, *
, (
, )
, +
, [
, \\
, ?
).*
- any 0+ chars, as many as possible .*
- 任何 0+ 个字符,尽可能多Here are some alternatives.这里有一些替代方案。
You may use the following base R code to extract 1+ characters other than .
您可以使用以下基本 R 代码来提取 1+ 个字符,而不是
.
( [^.]+
) after the first @
: (
[^.]+
) 在第一个@
:
> t <- "Current CPU load - jvm machine[example network-app_svc_group_mem4]@server101.example.com"
> pattern="@([^.]+)"
> m <- regmatches(t,regexec(pattern,t))
> result = unlist(m)[2]
> result
[1] "server101"
With regexec
, you can access submatches (capturing group contents).使用
regexec
,您可以访问子regexec
(捕获组内容)。
See the online R demo查看在线 R 演示
Another way is to use regmatches
/ regexpr
with a PCRE regex with a (?<=@)
lookbehind that only checks for the character presence, but does not put the character into the match:另一种方法是将
regmatches
/ regexpr
与 PCRE 正则表达式一起使用,带有(?<=@)
后视,仅检查字符是否存在,但不会将字符放入匹配中:
> result2 <- regmatches(t, regexpr("(?<=@)[^.]+", t, perl=TRUE))
> result2
[1] "server101"
A clean stringr approach would be to use the same PCRE regex with str_extract
(that uses a similar (because it also supports lookarounds), ICU, regex flavor):一个干净的stringr方法是使用与
str_extract
相同的 PCRE 正则表达式(使用类似的(因为它也支持环视)、ICU、正则表达式风格):
> library(stringr)
> t<-c("Current CPU load - jvm machine[example network-app_svc_group_mem4]@server101.example.com")
> str_extract(t, "(?<=@)[^.]+")
[1] "server101"
with stringr:与字符串:
library(stringr)
str_match(t, ".*@([^\\.]*)\\..*")[2]
#[1] "server101"
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.