简体   繁体   English

data.table 文本过滤 R

[英]data.table text filtering R

I am trying to filter some text of a data.table looking for a similar way to dplyr::filter (I am using a data.table approach for efficiency reasons).我正在尝试过滤 data.table 的一些文本,寻找与 dplyr::filter 类似的方法(出于效率原因,我使用 data.table 方法)。

However, the filtering process in data.table only returns strings where the exact match is found.但是,data.table 中的过滤过程只返回找到完全匹配的字符串。 Contrarily, dplyr::filter returns rows where the pattern is found, not only when it is the exact pattern.相反,dplyr::filter 返回找到模式的行,而不仅仅是当它是精确模式时。

See below for an example.请参阅下面的示例。

df <- data.frame (first  = c("value_1 and value_2", "value_2", "value_1", "value_1"),
                  second = c(1, 2, 3, 4))

dt.output <- setDT(df)[first %in% c("value_1") ]
filter.output <- dplyr::filter(df, grepl("value_1", first))

dt.output only returns the rows that uniquely contain value_1 (3, 4). dt.output仅返回唯一包含value_1 (3, 4) 的行。 filter.output returns rows that contains value_1 (1, 3, 4) filter.output返回包含value_1 (1, 3, 4) 的行

Is it possible to use data.table to filter text while returning the same results as dplyr::filter ?是否可以使用 data.table 过滤文本,同时返回与dplyr::filter相同的结果?

df <- data.frame (first  = c("value_1 and value_2", "value_2", "value_1", "value_1"),
                  second = c(1, 2, 3, 4))

dt.output <- setDT(df)[first %in% c("value_1") ]
filter.output <- dplyr::filter(df, grepl("value_1", first))

This behavior is not a dplyr::filter vs data.table .此行为不是dplyr::filter vs data.table It is just that %in% is looking for fixed matches while grepl returns TRUE for substring matches as well.只是%in%正在寻找固定匹配,而grepl也为 substring 匹配返回 TRUE。 If we use grepl in the data.table, it works as well如果我们在 data.table 中使用grepl ,它也可以工作

library(data.table)
setDT(df)[grepl("value_1", first)]
                  first second
1: value_1 and value_2      1
2:             value_1      3
3:             value_1      4

Or may also use %like%或者也可以使用%like%

 setDT(df)[first %like% "value_1"]
                 first second
1: value_1 and value_2      1
2:             value_1      3
3:             value_1      4

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM