dplyr：基于另一列的过滤器

Question

Let's say that I have the following data and am interested in grabbing data by date where the type is "ts". 假设我有以下数据，并且有兴趣按类型为“ ts”的日期获取数据。 Of course, there are dates where ts is not available, and I need to revert to the 'real' values for those dates. 当然，有些日期还没有ts，我需要将这些日期恢复为“真实”值。

dat = data.frame(dte = c("2011-01-01","2011-02-01","2011-03-01","2011-04-01","2011-05-01",
                         "2011-01-01","2011-02-01","2011-03-01"),
                 type = c("real","real","real","real","real","ts","ts","ts"),
                 value=rnorm(8))
dat

cpy = dat %>% dplyr::filter(type == "ts") 

cpy

How can something like that be done in dplyr. 如何在dplyr中完成类似的操作。

Expected output is: 预期输出为：

dte            type    value
"2011-01-01"   ts      ....
"2011-02-01"   ts
"2011-03-01"   ts  
"2011-04-01"   real
"2011-05-01"   real

Answer 1

You can try with base packages, 您可以尝试使用基本软件包，

rbind(dat[dat$type == "ts",], dat[!unique(dat$dte) %in% 
                                               dat[dat$type == "ts","dte"], ])

#     dte     type       value
#6 2011-01-01   ts -0.98109206
#7 2011-02-01   ts  1.67626166
#8 2011-03-01   ts -0.06997343
#4 2011-04-01 real  1.27243996
#5 2011-05-01 real -1.63594680

Taking the rows with type equal to ts and rbind ing the remaining dates from the real type. 取type等于ts的行，然后从real类型中rbind剩余日期。

Answer 2

One idea could be to group_by() date and keep values where type == "ts" or when, for a given date, there are no type == "ts" , keep the other value: 一个想法可能是group_by()日期并在type == "ts"时保留值，或者在给定日期没有type == "ts" ，保留另一个值：

dat %>%
  group_by(dte) %>%
  filter(type == "ts" | !any(type == "ts"))

Which gives: 这使：

#Source: local data frame [5 x 3]
#Groups: dte [5]
#
#         dte   type      value
#      <fctr> <fctr>      <dbl>
#1 2011-04-01   real  0.2522234
#2 2011-05-01   real -0.8919211
#3 2011-01-01     ts  0.4356833
#4 2011-02-01     ts -1.2375384
#5 2011-03-01     ts -0.2242679

Answer 3

Using dplyr , we can also use which.max 使用dplyr ，我们也可以使用which.max

library(dplyr)
dat %>%
    group_by(dte) %>%
    slice(which.max(factor(type)))    
#        dte   type      value
#      <fctr> <fctr>      <dbl>
#1 2011-01-01     ts -0.5052456
#2 2011-02-01     ts -0.4038810
#3 2011-03-01     ts -1.5349627
#4 2011-04-01   real  1.6812035
#5 2011-05-01   real -0.9902754

Or using a similar option with data.table 或对data.table使用类似的选项

library(data.table)
setDT(dat)[, .SD[which.max(factor(type))] , dte]
#        dte type      value
#1: 2011-01-01   ts -0.5052456
#2: 2011-02-01   ts -0.4038810
#3: 2011-03-01   ts -1.5349627
#4: 2011-04-01 real  1.6812035
#5: 2011-05-01 real -0.9902754

dplyr：基于另一列的过滤器

问题描述

3 个解决方案

解决方案1
3 2016-07-15 13:10:39

解决方案2
2 2016-07-15 13:32:22

解决方案3
0 2016-07-15 12:53:29

dplyr：基于另一列的过滤器

问题描述

3 个解决方案

解决方案1 3 2016-07-15 13:10:39

解决方案2 2 2016-07-15 13:32:22

解决方案3 0 2016-07-15 12:53:29

解决方案1
3 2016-07-15 13:10:39

解决方案2
2 2016-07-15 13:32:22

解决方案3
0 2016-07-15 12:53:29