![](/img/trans.png)
[英]R data.table doing an inner join on a field and operating on another?
[英]Create new field in R data.table using RegEx on another field
給定此data.table
:
library(data.table)
dt <- data.table(f1 = c(
"stuffstuff-0000097125",
"stuffstuff.abc.0006496679",
"stuffstuff0007517235",
"stuffstuff_xyz.0007280719",
"stuffstuff0005995303",
"stuffstuff_a1b_0000143856",
"stuffstuff0009362407",
"stuffstuff.c44_0009735298"
))
我想得到這些結果:
f1 parsed_val
1: stuffstuff-0000097125
2: stuffstuff.abc.0006496679 abc
3: stuffstuff0007517235
4: stuffstuff_xyz.0007280719 xyz
5: stuffstuff0005995303
6: stuffstuff_a1b_0000143856 a1b
7: stuffstuff0009362407
8: stuffstuff.c44_0009735298 c44
這是我嘗試過的:
rex_pattern <- "(?<=(\\.|\\_|\\-))[A-Za-z0-9]{3}(?=(\\.|\\_|\\-)[0-9]{3,})"
dt[, `:=`(parsed_val = regmatches(f1, regexpr(pattern = rex_pattern, f1, perl = TRUE)))]
但是,由於回收,這些是我得到的結果:
f1 parsed_val
1: stuffstuff-0000097125 abc
2: stuffstuff.abc.0006496679 xyz
3: stuffstuff0007517235 a1b
4: stuffstuff_xyz.0007280719 c44
5: stuffstuff0005995303 abc
6: stuffstuff_a1b_0000143856 xyz
7: stuffstuff0009362407 a1b
8: stuffstuff.c44_0009735298 c44
我試圖在函數中使用ifelse
返回空字符串:
getMmFromFilename <- function(my_file_name){
rex_pattern <- "(?<=(\\.|\\_|\\-))[A-Za-z0-9]{3}(?=(\\.|\\_|\\-)[0-9]{3,})"
nothing_found <- character(length = 0)
mm <- regmatches(my_file_name, regexpr(pattern = rex_pattern, my_file_name, perl = TRUE))
ifelse(identical(mm, nothing_found), "missing_Mm", mm)
}
dt[, .(parsed_val = getMmFromFilename(f1))]
但這僅返回1的abc
值。 regmatches
的文檔說:“對於向量匹配數據(從regexpr獲得),將刪除空匹配項;對於列表匹配數據,空匹配將給出空組件(零長度字符向量)。” 我猜想解決方案就在這里,但我還沒有得到...
至於解決方案,我的工作流程要求我使用data.table
,對解決方案的簡要說明將有很大的幫助...
提前致謝。
dt[,parser_val:=sub(".*?[._](.*)[._].*|.*","\\1",f1)]
dt
f1 parser_val
1: stuffstuff-0000097125
2: stuffstuff.abc.0006496679 abc
3: stuffstuff0007517235
4: stuffstuff_xyz.0007280719 xyz
5: stuffstuff0005995303
6: stuffstuff_a1b_0000143856 a1b
7: stuffstuff0009362407
8: stuffstuff.c44_0009735298 c44
如果要使用regmatches
,則可以使用pattern="(?<=[._]).*(?=[._])|$"
和perl=TRUE
dt[,parser_val:=regmatches(dt$f1,regexpr("(?<=[._]).*(?=[._])|$",dt$f1,perl = T))]
> dt
f1 parser_val
1: stuffstuff-0000097125
2: stuffstuff.abc.0006496679 abc
3: stuffstuff0007517235
4: stuffstuff_xyz.0007280719 xyz
5: stuffstuff0005995303
6: stuffstuff_a1b_0000143856 a1b
7: stuffstuff0009362407
8: stuffstuff.c44_0009735298 c44
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.