[英]R: Extract data from string using POSIX regular expression
How to extract only DATABASE_NAME from this string using POSIX-style regular expressions? 如何使用POSIX样式的正则表达式从此字符串中仅提取DATABASE_NAME?
st <- "MICROSOFT_SQL_SERVER.DATABASE\INSTANCE.DATABASE_NAME."
First of all, this generates an error 首先,这会产生一个错误
Error: '\I' is an unrecognized escape in character string starting "MICROSOFT_SQL_SERVER.DATABASE\I"
I was thinking something like 我在想类似
sub(".*\\.", st, "")
The first problem is that you need to escape the \\
in your string: 第一个问题是您需要在字符串中转义
\\
:
st <- "MICROSOFT_SQL_SERVER.DATABASE\\INSTANCE.DATABASE_NAME."
As for the main problem, this will return the bit you want from the string you gave: 对于主要问题,这将从您提供的字符串中返回您想要的位:
> sub("\\.$", "", sub("[A-Za-z0-9\\._]*\\\\[A-Za-z]*\\.", "", st))
[1] "DATABASE_NAME"
But a simpler solution would be to split on the \\\\.
但是更简单的解决方案是在
\\\\.
上拆分\\\\.
and select the last chunk: 然后选择最后一块:
> strsplit(st, "\\.")[[1]][3]
[1] "DATABASE_NAME"
or slightly more automated 或更自动化
> sst <- strsplit(st, "\\.")[[1]]
> tail(sst, 1)
[1] "DATABASE_NAME"
Other answers provided some really good alternative ways of cracking the problem using strsplit
or str_split
. 其他答案提供了一些非常好的使用
strsplit
或str_split
解决问题的方法。
However, if you really want to use a regex and gsub
, this solution substitutes the first two occurrences of a (string followed by a period) with an empty string. 但是,如果您确实要使用regex和
gsub
,则此解决方案用空字符串替换前两个出现的(字符串,后跟句点)。
Note the use of the ?
注意使用
?
modifier to tell the regex not to be greedy, as well as the {2}
modifier to tell it to repeat the expression in brackets two times. 修饰符告诉正则表达式不要贪婪,以及
{2}
修饰符告诉它将括号中的表达式重复两次。
gsub("\\.", "", gsub("(.+?\\.){2}", "", st))
[1] "DATABASE_NAME"
An alternative approach is to use str_split
in package stringr
. 另一种方法是在package
stringr
使用str_split
。 The idea is to split st into strings at each period, and then to isolate the third string: 这个想法是在每个周期将st拆分为字符串,然后隔离第三个字符串:
st <- "MICROSOFT_SQL_SERVER.DATABASE\\INSTANCE.DATABASE_NAME."
library(stringr)
str_split(st, "\\.")[[1]][3]
[1] "DATABASE_NAME"
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.