[英]Check R data.frame column for equal values in another column
I'm looking for a vectorized solution to the following problem.我正在寻找以下问题的矢量化解决方案。 There are customers that can have one of two different products, x or y, at a time.
有些客户一次可以拥有两种不同的产品 x 或 y 中的一种。 I would like to identify all rows of product x that are followed by product y for the same customer.
我想确定同一客户的产品 x 后跟产品 y 的所有行。 In that case, the
to_date
of product x would be the same as the from_date
of product y.在这种情况下,产品 x 的
to_date
将与产品 y 的from_date
相同。 Here is an example:下面是一个例子:
customerid = c(rep(1,2),rep(2,3))
product = c("x", "y", "x", "x", "y")
from_date = as.Date(c("2000-01-01", "2000-06-07","2001-02-01","2005-01-01","2005-11-01"))
to_date = as.Date(c("2000-06-07", "2000-10-31","2002-04-01","2005-11-01","2006-01-01"))
data.frame(customerid, product, from_date, to_date)
customerid product from_date to_date
1 1 x 2000-01-01 2000-06-07
2 1 y 2000-06-07 2000-10-31
3 2 x 2001-02-01 2002-04-01
4 2 x 2005-01-01 2005-11-01
5 2 y 2005-11-01 2006-01-01
The desired output would look like:所需的输出如下所示:
customerid product from_date to_date followed_by_y
1 1 x 2000-01-01 2000-06-07 yes
2 1 y 2000-06-07 2000-10-31 no
3 2 x 2001-02-01 2002-04-01 no
4 2 x 2005-01-01 2005-11-01 yes
5 2 y 2005-11-01 2006-01-01 no
My approach so far is to group the data.frame by costumerid
with dplyr.到目前为止,我的方法是使用 dplyr 通过
costumerid
对data.frame 进行分组。 But then I do not know how to check the to_date
for equal values in the from_date
.但是后来我不知道如何在
from_date
检查to_date
是否具有相等的值。
You could check for all conditions like below:您可以检查以下所有条件:
library(dplyr)
df %>%
group_by(customerid) %>%
mutate(followed_by_y = c('no', 'yes')[(product == 'x' &
lead(product) == 'y' &
to_date == lead(from_date)) + 1])
Output:输出:
# A tibble: 5 x 5
# Groups: customerid [2]
customerid product from_date to_date followed_by_y
<dbl> <fct> <date> <date> <chr>
1 1 x 2000-01-01 2000-06-07 yes
2 1 y 2000-06-07 2000-10-31 no
3 2 x 2001-02-01 2002-04-01 no
4 2 x 2005-01-01 2005-11-01 yes
5 2 y 2005-11-01 2006-01-01 no
Note, this is essentially the same as saying:请注意,这与说以下内容基本相同:
library(dplyr)
df %>%
group_by(customerid) %>%
mutate(followed_by_y = case_when(
product == 'x' & lead(product) == 'y' & to_date == lead(from_date) ~ 'yes',
TRUE ~ 'no')
)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.