[英]data.table text filtering R
I am trying to filter some text of a data.table looking for a similar way to dplyr::filter (I am using a data.table approach for efficiency reasons).我正在尝试过滤 data.table 的一些文本,寻找与 dplyr::filter 类似的方法(出于效率原因,我使用 data.table 方法)。
However, the filtering process in data.table only returns strings where the exact match is found.但是,data.table 中的过滤过程只返回找到完全匹配的字符串。 Contrarily, dplyr::filter returns rows where the pattern is found, not only when it is the exact pattern.
相反,dplyr::filter 返回找到模式的行,而不仅仅是当它是精确模式时。
See below for an example.请参阅下面的示例。
df <- data.frame (first = c("value_1 and value_2", "value_2", "value_1", "value_1"),
second = c(1, 2, 3, 4))
dt.output <- setDT(df)[first %in% c("value_1") ]
filter.output <- dplyr::filter(df, grepl("value_1", first))
dt.output
only returns the rows that uniquely contain value_1
(3, 4). dt.output
仅返回唯一包含value_1
(3, 4) 的行。 filter.output
returns rows that contains value_1
(1, 3, 4) filter.output
返回包含value_1
(1, 3, 4) 的行
Is it possible to use data.table to filter text while returning the same results as dplyr::filter
?是否可以使用 data.table 过滤文本,同时返回与
dplyr::filter
相同的结果?
df <- data.frame (first = c("value_1 and value_2", "value_2", "value_1", "value_1"),
second = c(1, 2, 3, 4))
dt.output <- setDT(df)[first %in% c("value_1") ]
filter.output <- dplyr::filter(df, grepl("value_1", first))
This behavior is not a dplyr::filter
vs data.table
.此行为不是
dplyr::filter
vs data.table
。 It is just that %in%
is looking for fixed matches while grepl
returns TRUE for substring matches as well.只是
%in%
正在寻找固定匹配,而grepl
也为 substring 匹配返回 TRUE。 If we use grepl
in the data.table, it works as well如果我们在 data.table 中使用
grepl
,它也可以工作
library(data.table)
setDT(df)[grepl("value_1", first)]
first second
1: value_1 and value_2 1
2: value_1 3
3: value_1 4
Or may also use %like%
或者也可以使用
%like%
setDT(df)[first %like% "value_1"]
first second
1: value_1 and value_2 1
2: value_1 3
3: value_1 4
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.