[英]R extract string from data.table column
This post is a continuation from R search subset string from data.table column for Capitalized words I need to add more conditions into this.这篇文章是R 搜索子集字符串 data.table 列的大写单词的延续,我需要在其中添加更多条件。 A Sample Data.tabel would be
一个示例 Data.tabel 将是
dt <- data.table(Msg= c("SOMENote: THIS_IS_IMPORTANT Rest of Message",
"SOMENote: THIS-IS Not Important. THIS_IS Rest of Message",
"SOMENote: no_string_here.. THIS_IS_IMPORTANT Rest of Message",
"SOMENote: THIS_HAS_110KV_Numbers. Rest of Message"))
output <- c("THIS_IS_IMPORTANT",
"THIS_IS",
"THIS_IS_IMPORTANT",
"THIS_HAS_110KV_Numbers")
I want to Extract From the Message the string in the form THIS_IS_IMPORTANT which can appear anywhere in the Message after "SOMENote:"
.我想从消息中提取THIS_IS_IMPORTANT形式的字符串,它可以出现在消息中
"SOMENote:"
之后的任何位置。
The format also has numbers in some rows, like THIS_100L_HAS_NUMBERS .该格式在某些行中也有数字,例如THIS_100L_HAS_NUMBERS 。
In general, the Capitalized words with underscore between.一般情况下,大写单词之间都带有下划线。
You can use sub
, regexpr
with regmatches
to extract the hit:您可以使用
sub
、 regexpr
和regmatches
来提取命中:
y <- sub(".*:[^A-Z]*", "", x) #Remove eveything until : and not A-Z
regmatches(y, regexpr("[A-Z0-9]+_\\w*", y))
[1] "THIS_IS_IMPORTANT" "THIS_IS" "THIS_IS_IMPORTANT"
[4] "THIS_HAS_110KV_Numbers"
Data:数据:
x <- c("SOMENote: THIS_IS_IMPORTANT Rest of Message",
"SOMENote: THIS-IS Not Important. THIS_IS Rest of Message",
"SOMENote: no_string_here.. THIS_IS_IMPORTANT Rest of Message",
"SOMENote: THIS_HAS_110KV_Numbers. Rest of Message")
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.