[英]Find rows in a data frame where the text in one column can be found in another column, in R
I want to identify rows in a data frame where the text in one column can be found in another column.我想识别数据框中的行,其中一列中的文本可以在另一列中找到。 For example, in the data frame below, I would like to identify the rows in which the model column contains the text in the gear column (in this case, rows 1, 2, 7, 8, 32).
例如,在下面的数据框中,我想确定模型列包含齿轮列中文本的行(在本例中为第 1、2、7、8、32 行)。
mydf <- cbind.data.frame(model=rownames(mtcars), gear=as.character(mtcars$gear), stringsAsFactors=F)
mydf
model gear
1 Mazda RX4 4
2 Mazda RX4 Wag 4
3 Datsun 710 4
4 Hornet 4 Drive 3
5 Hornet Sportabout 3
6 Valiant 3
7 Duster 360 3
8 Merc 240D 4
9 Merc 230 4
10 Merc 280 4
11 Merc 280C 4
12 Merc 450SE 3
13 Merc 450SL 3
14 Merc 450SLC 3
15 Cadillac Fleetwood 3
16 Lincoln Continental 3
17 Chrysler Imperial 3
18 Fiat 128 4
19 Honda Civic 4
20 Toyota Corolla 4
21 Toyota Corona 3
22 Dodge Challenger 3
23 AMC Javelin 3
24 Camaro Z28 3
25 Pontiac Firebird 3
26 Fiat X1-9 4
27 Porsche 914-2 5
28 Lotus Europa 5
29 Ford Pantera L 5
30 Ferrari Dino 5
31 Maserati Bora 5
32 Volvo 142E 4
It seems like I should be able to use something like grep or match in combination with something like apply or map, or even ifelse, but I can't quite figure it out.似乎我应该能够将 grep 或 match 之类的东西与 apply 或 map 之类的东西结合使用,甚至 ifelse 之类的东西,但我不太明白。 (I could of course do a for loop but I have several million rows of data and would prefer not to.)
(我当然可以做一个 for 循环,但我有几百万行数据,不想这样做。)
Try this:尝试这个:
mydf$flag = apply(mydf,1, function(x){grepl(x["gear"],x["model"])})
This will result:这将导致:
> head(mydf,20)
model gear flag
1 Mazda RX4 4 TRUE
2 Mazda RX4 Wag 4 TRUE
3 Datsun 710 4 FALSE
4 Hornet 4 Drive 3 FALSE
5 Hornet Sportabout 3 FALSE
6 Valiant 3 FALSE
7 Duster 360 3 TRUE
8 Merc 240D 4 TRUE
9 Merc 230 4 FALSE
10 Merc 280 4 FALSE
11 Merc 280C 4 FALSE
12 Merc 450SE 3 FALSE
13 Merc 450SL 3 FALSE
14 Merc 450SLC 3 FALSE
15 Cadillac Fleetwood 3 FALSE
16 Lincoln Continental 3 FALSE
17 Chrysler Imperial 3 FALSE
18 Fiat 128 4 FALSE
19 Honda Civic 4 FALSE
20 Toyota Corolla 4 FALSE
stringr
, part of tidyverse
, has a vectorized implementation of grepl
: stringr
的一部分, tidyverse
,具有矢量执行grepl
:
library(tidyverse)
mydf %>% mutate(flag = str_detect(model,gear)) %>% head
# model gear flag
# 1 Mazda RX4 4 TRUE
# 2 Mazda RX4 Wag 4 TRUE
# 3 Datsun 710 4 FALSE
# 4 Hornet 4 Drive 3 FALSE
# 5 Hornet Sportabout 3 FALSE
# 6 Valiant 3 FALSE
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.