简体   繁体   中英

Adressing data.table columns based on two grep() commands in R

I'm trying to find rows based on two conditions in a data.table. These two conditions are the existence of certain words in a long string. A minimal example looks like this:

library("data.table")
dt <- data.table(var1 = c("abc","adb","acf"))

and now I try to find element 1 and 2 by looking for "a" and "b" appearing togehter in the same entry of var1 . In reality, the data table has several hundred thousand entries and the strings are long formulas in which I look for multi-character words. Here is my attempt:

dt[grep("a", var1) & grep("b", var1)]

that throws a warning:

In grep("a", var1) & grep("b", var1) :
longer object length is not a multiple of shorter object length

which looks like data.table is doing something sequentially? In my mind, this should be the same as dt[var1 == X & var2 == Y] which would work... Any help is appreciated!

ps: For completeness here is the warning of my actual data which I hope has the same origin (otherwise my example is trash):

Error in `[.data.table`(collected, grep(pairs[i, 1], model_formula) &  : 
i evaluates to a logical vector length 423098 but there are 3980284 rows. 
Recycling of logical i is no longer allowed as it hides more bugs than is 
worth the rare convenience. Explicitly use rep(...,length=.N) if you 
really need to recycle.
In addition: Warning message:
In grep(pairs[i, 1], model_formula) & grep(pairs[i, 2], model_formula) :
longer object length is not a multiple of shorter object length

replace grep by grepl and it should work fine.. or adjust your regex as described in the comments.

dt[grepl("a", var1) & grepl("b", var1)]

#    var1
# 1:  abc
# 2:  adb

A third option based on intersect() :

dt[intersect(grep("a", var1), grep("b", var1))]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM