简体   繁体   English

你如何在R中提取两个字符之间的值?

[英]how do you extract values between two characters in R?

I am trying to extract the server name (server101) from this string in R using regular expression:我正在尝试使用正则表达式从 R 中的这个字符串中提取服务器名称(server101):

value between @ and the following first period (.) @和下一个句点 (.)之间的值

t<-c("Current CPU load - jvm machine[example network-app_svc_group_mem4]@server101.example.com")

I've tried this:我试过这个:

gsub('.*\\@(\\d+),(\\d+).*', '\\1', t)

this does not seem to be working, any ideas?这似乎不起作用,有什么想法吗?

Since you only expect one match, you may use a simple sub here:由于您只期待一场比赛,您可以在这里使用一个简单的sub

t <- "Current CPU load - jvm machine[example network-app_svc_group_mem4]@server101.example.com"
sub(".*@([^.]+)\\..*", "\\1", t)
##  => [1] "server101"

See the R demo online .在线查看R 演示

Details细节

  • .* - any 0+ chars, as many as possible .* - 任何 0+ 个字符,尽可能多
  • @ - a @ char @ - 一个@字符
  • ([^.]+) - Group 1 ( "\\\\1" ): ([^.]+) - 第 1 组( "\\\\1" ):
  • \\\\. - a dot (other chars you need to escape are $ , ^ , * , ( , ) , + , [ , \\ , ? ) - 一个点(你需要转义的其他字符是$ , ^ , * , ( , ) , + , [ , \\ , ?
  • .* - any 0+ chars, as many as possible .* - 任何 0+ 个字符,尽可能多

Here are some alternatives.这里有一些替代方案。

You may use the following base R code to extract 1+ characters other than .您可以使用以下基本 R 代码来提取 1+ 个字符,而不是. ( [^.]+ ) after the first @ : ( [^.]+ ) 在第一个@

> t <- "Current CPU load - jvm machine[example network-app_svc_group_mem4]@server101.example.com"
> pattern="@([^.]+)"
> m <- regmatches(t,regexec(pattern,t))
> result = unlist(m)[2]
> result
[1] "server101"

With regexec , you can access submatches (capturing group contents).使用regexec ,您可以访问子regexec (捕获组内容)。

See the online R demo查看在线 R 演示

Another way is to use regmatches / regexpr with a PCRE regex with a (?<=@) lookbehind that only checks for the character presence, but does not put the character into the match:另一种方法是将regmatches / regexpr与 PCRE 正则表达式一起使用,带有(?<=@)后视,仅检查字符是否存在,但不会将字符放入匹配中:

> result2 <- regmatches(t, regexpr("(?<=@)[^.]+", t, perl=TRUE))
> result2
[1] "server101"

A clean stringr approach would be to use the same PCRE regex with str_extract (that uses a similar (because it also supports lookarounds), ICU, regex flavor):一个干净的stringr方法是使用与str_extract相同的 PCRE 正则表达式(使用类似的(因为它也支持环视)、ICU、正则表达式风格):

> library(stringr)
> t<-c("Current CPU load - jvm machine[example network-app_svc_group_mem4]@server101.example.com")
> str_extract(t, "(?<=@)[^.]+")
[1] "server101"

with stringr:与字符串:

library(stringr)
str_match(t, ".*@([^\\.]*)\\..*")[2]
#[1] "server101"

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM