简体   繁体   English

R:使用POSIX正则表达式从字符串中提取数据

[英]R: Extract data from string using POSIX regular expression

How to extract only DATABASE_NAME from this string using POSIX-style regular expressions? 如何使用POSIX样式的正则表达式从此字符串中仅提取DATABASE_NAME?

st <- "MICROSOFT_SQL_SERVER.DATABASE\INSTANCE.DATABASE_NAME."

First of all, this generates an error 首先,这会产生一个错误

Error: '\I' is an unrecognized escape in character string starting "MICROSOFT_SQL_SERVER.DATABASE\I"

I was thinking something like 我在想类似

sub(".*\\.", st, "")

The first problem is that you need to escape the \\ in your string: 第一个问题是您需要在字符串中转义\\

st <- "MICROSOFT_SQL_SERVER.DATABASE\\INSTANCE.DATABASE_NAME."

As for the main problem, this will return the bit you want from the string you gave: 对于主要问题,这将从您提供的字符串中返回您想要的位:

> sub("\\.$", "", sub("[A-Za-z0-9\\._]*\\\\[A-Za-z]*\\.", "", st))
[1] "DATABASE_NAME"

But a simpler solution would be to split on the \\\\. 但是更简单的解决方案是在\\\\.上拆分\\\\. and select the last chunk: 然后选择最后一块:

> strsplit(st, "\\.")[[1]][3]
[1] "DATABASE_NAME"

or slightly more automated 或更自动化

> sst <- strsplit(st, "\\.")[[1]]
> tail(sst, 1)
[1] "DATABASE_NAME"

Other answers provided some really good alternative ways of cracking the problem using strsplit or str_split . 其他答案提供了一些非常好的使用strsplitstr_split解决问题的方法。

However, if you really want to use a regex and gsub , this solution substitutes the first two occurrences of a (string followed by a period) with an empty string. 但是,如果您确实要使用regex和gsub ,则此解决方案用空字符串替换前两个出现的(字符串,后跟句点)。

Note the use of the ? 注意使用? modifier to tell the regex not to be greedy, as well as the {2} modifier to tell it to repeat the expression in brackets two times. 修饰符告诉正则表达式不要贪婪,以及{2}修饰符告诉它将括号中的表达式重复两次。

gsub("\\.", "", gsub("(.+?\\.){2}", "", st)) 
[1] "DATABASE_NAME"

An alternative approach is to use str_split in package stringr . 另一种方法是在package stringr使用str_split The idea is to split st into strings at each period, and then to isolate the third string: 这个想法是在每个周期将st拆分为字符串,然后隔离第三个字符串:

st <- "MICROSOFT_SQL_SERVER.DATABASE\\INSTANCE.DATABASE_NAME."

library(stringr)

str_split(st, "\\.")[[1]][3]

[1] "DATABASE_NAME"

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM