按组连续的天数-突变函数Dplyr中的错误

Question

This is a continuation from the question: Record Consecutive Days by Group in R 这是以下问题的延续：在R中按组记录连续天数

The answer worked for the dataset in the example I posted but I realized there was something wrong with my actual dataset and an error came up stating: Error: incompatible size (0), expecting 1 (the group size) or 1 答案适用于我发布的示例中的数据集，但我意识到我的实际数据集有问题，并且出现了一个错误，指出： Error: incompatible size (0), expecting 1 (the group size) or 1

Below is the dataset and reproducible example where the error comes up. 下面是出现错误的数据集和可复制示例。 Anybody know why this is happening? 有人知道为什么会这样吗？

DATE <- as.Date(c('2016-10-26', '2016-10-30', '2016-10-26', '2016-10-20', '2016-10-21', '2016-10-17', '2016-10-26', '2016-10-17', '2016-10-18', '2016-10-20', '2016-10-17', '2016-10-18', '2016-10-17', '2016-10-18', '2016-10-19','2016-10-18', '2016-10-19','2016-10-17','2016-10-17','2016-10-19','2016-10-19','2016-10-20','2016-10-19','2016-10-20','2016-10-30'))
`Parent` <- c('A','A','A','A','A','A','A','B', 'B', 'B', 'C', 'C', 'D', 'D', 'D', 'D', 'D', 'E', 'E', 'F', 'G', 'G', 'G', 'G', 'G')
Child <- c('ab', 'ac', 'ad', 'ae', 'ae','af', 'af','ba', 'ba', 'ba', 'ca', 'cb', 'da', 'da', 'da', 'db', 'db', 'ea', 'eb', 'fa', 'ga', 'ga', 'gb', 'gb', 'gb')
salary <- c(290.45, 0.00, 336.51, 2238.56, 2256.75, 725.73, 319.69, 46.48, 42.13, 43.22, 0.41, 865.20, 1889.80, 2691.97, 3016.80, 8636.18, 8540.24, 1587.21, 1416.63, 79.62,1967.95,1947.35,34925.58,31158.51,6973.54)
avg_child_salary <- c(500.29, 526.27, 492.00, 1197.25, 1197.25, 474.10, 474.10, 21.68, 21.68, 21.68, 0.05, 199.90, 575.31, 575.31, 575.31, 1701.82, 1701.82, 495.48, 316.93, 26.16, 582.66, 582.66, 18089.83, 18089.83, 18089.83)
Callout <- c('LOW', 'LOW', 'LOW', 'HIGH', 'HIGH', 'HIGH', 'LOW', 'HIGH', 'HIGH', 'HIGH', 'HIGH', 'HIGH', 'HIGH', 'HIGH', 'HIGH', 'HIGH', 'HIGH', 'HIGH', 'HIGH', 'HIGH', 'HIGH', 'HIGH', 'HIGH', 'HIGH', 'LOW')
employ.data <- data.frame(DATE, Parent, Child, avg_child_salary, salary, Callout)

employ.data

         DATE Parent Child avg_child_salary   salary Callout
1  2016-10-26      A    ab           500.29   290.45     LOW
2  2016-10-30      A    ac           526.27     0.00     LOW
3  2016-10-26      A    ad           492.00   336.51     LOW
4  2016-10-20      A    ae          1197.25  2238.56    HIGH
5  2016-10-21      A    ae          1197.25  2256.75    HIGH
6  2016-10-17      A    af           474.10   725.73    HIGH
7  2016-10-26      A    af           474.10   319.69     LOW
8  2016-10-17      B    ba            21.68    46.48    HIGH
9  2016-10-18      B    ba            21.68    42.13    HIGH
10 2016-10-20      B    ba            21.68    43.22    HIGH
11 2016-10-17      C    ca             0.05     0.41    HIGH
12 2016-10-18      C    cb           199.90   865.20    HIGH
13 2016-10-17      D    da           575.31  1889.80    HIGH
14 2016-10-18      D    da           575.31  2691.97    HIGH
15 2016-10-19      D    da           575.31  3016.80    HIGH
16 2016-10-18      D    db          1701.82  8636.18    HIGH
17 2016-10-19      D    db          1701.82  8540.24    HIGH
18 2016-10-17      E    ea           495.48  1587.21    HIGH
19 2016-10-17      E    eb           316.93  1416.63    HIGH
20 2016-10-19      F    fa            26.16    79.62    HIGH
21 2016-10-19      G    ga           582.66  1967.95    HIGH
22 2016-10-20      G    ga           582.66  1947.35    HIGH
23 2016-10-19      G    gb         18089.83 34925.58    HIGH
24 2016-10-20      G    gb         18089.83 31158.51    HIGH
25 2016-10-30      G    gb         18089.83  6973.54     LOW

Then from this dataset I want to gather all the rows containing 2016-10-30 and then in a separate column, count the number of consecutive days with a callout of LOW or HIGH based on the employ.data dataframe. 然后，我要从该数据集中收集包含2016-10-30所有行，然后在单独的列中，根据employee.data数据帧，用LOW或HIGH标注计数连续的天数。 The number of consecutive days needs to be in a new column next to Callout. 连续天数必须在“标注”旁边的新列中。 This is before applying the errored script: 这是在应用错误的脚本之前：

yesterday <- as.Date(Sys.Date()-37)
df2<-filter(employ.data, DATE == yesterday)
df2 

         DATE Parent Child avg_child_salary   salary Callout  
2  2016-10-30      A    ac           526.27     0.00     LOW                          
25 2016-10-30      G    gb         18089.83  6973.54     LOW

The code that was attempted is below: 尝试的代码如下：

library(dplyr)
yesterday <- as.Date(Sys.Date()-37) ##because today is 12/6/16
df2 <- employ.data %>% group_by(Child) %>%
  mutate(`Consec. Days with Callout`=cumsum(rev(cumprod(rev((yesterday-DATE)==(which(DATE == yesterday)-row_number()) & Callout==Callout[DATE == yesterday]))))) %>% filter(DATE == yesterday)

In the end it needs to look like this for this particular example: 最后，对于此特定示例，它需要看起来像这样：

         DATE Parent Child avg_child_salary   salary Callout  Consec. Days with Callout
2  2016-10-30      A    ac           526.27     0.00     LOW                          1
25 2016-10-30      G    gb         18089.83  6973.54     LOW                          1

Then the error comes up: 然后出现错误：

Error: incompatible size (0), expecting 1 (the group size) or 1

Answer 1

The issue is that for some groups, the row for yesterday is not found. 问题是，对于某些组，找不到yesterday的行。 This can be fixed by defining a function that checks for that instead of inlining the function in mutate : 可以通过定义一个检查该功能的函数来解决此问题，而不是在mutate中内联该函数：

library(dplyr)
compute.consec.days <- function(date, callout, yesterday, rown) {
  j <- which(date == yesterday)
  if (length(j)==0) NA else cumsum(rev(cumprod(rev((yesterday-date)==(j-rown) & callout==callout[date == yesterday]))))
}

This function checks which DATE is yesterday . 该函数检查which DATE是yesterday 。 If not found for the group, then this will return integer(0) . 如果找不到该组，则将返回integer(0) 。 We check this by the length of the return value j . 我们通过返回值j的length进行检查。 If this is TRUE , we return NA for the consecutive days, which does not matter since the following filter will remove that group (ie, yesterday is not found); 如果为TRUE ，则连续两天返回NA ，这没有关系，因为以下filter将删除该组（即找不到yesterday ）； otherwise, we compute the consecutive days as before. 否则，我们将像以前一样计算连续的天数。 This avoids the error. 这样可以避免错误。 Now, with this function and your newly posted data: 现在，使用此功能和您新发布的数据：

yesterday <- as.Date("2016-10-30")
out <- employ.data %>% group_by(Child) %>%
  mutate(`Consec. Days with Callout`=compute.consec.days(DATE,Callout,yesterday,row_number())) %>%
  filter(DATE == yesterday)
##Source: local data frame [2 x 7]
##Groups: Child [2]
##
##        DATE Parent  Child avg_child_salary  salary Callout Consec. Days with Callout
##      <date> <fctr> <fctr>            <dbl>   <dbl>  <fctr>                     <dbl>
##1 2016-10-30      A     ac           526.27    0.00     LOW                         1
##2 2016-10-30      G     gb         18089.83 6973.54     LOW                         1

Update to support the case where yesterday is not the last* date in group* 更新以支持昨天不是组中最后日期的情况

If the query for yesterday is not the last day for any of the Child groups, then we need to modify our compute.consec.days function as such: 如果查询yesterday不是任何的最后一天， Child组，那么我们需要修改我们的compute.consec.days功能，例如：

compute.consec.days <- function(date, callout, yesterday, rown) {
  j <- which(date == yesterday)
  if (length(j)==0) NA else {
    ## first compute the condition
    cond <- (yesterday-date)==(j-rown) & callout==callout[date == yesterday]
    ## then evaluate consecutive days only with this vector up to
    ## the row corresponding to yesterday. Then add the result with NAs
    ## because mutate is a windowing function
    c(cumsum(rev(cumprod(rev(cond[1:j[1]])))),rep(NA,length(date)-j[1]))
  }
}

For example, if the query for yesterday is "2016-10-20" given the newly posted data, then this results in: 例如，如果给定新发布的数据，昨天的查询为"2016-10-20" ，则结果为：

yesterday <- as.Date("2016-10-20")
out <- employ.data %>% group_by(Child) %>%
  mutate(`Consec. Days with Callout`=compute.consec.days(DATE,Callout,yesterday,row_number())) %>%
  filter(DATE == yesterday)
##Source: local data frame [4 x 7]
##Groups: Child [4]
##
##        DATE Parent  Child avg_child_salary   salary Callout Consec. Days with Callout
##      <date> <fctr> <fctr>            <dbl>    <dbl>  <fctr>                     <dbl>
##1 2016-10-20      A     ae          1197.25  2238.56    HIGH                         1
##2 2016-10-20      B     ba            21.68    43.22    HIGH                         1
##3 2016-10-20      G     ga           582.66  1947.35    HIGH                         2
##4 2016-10-20      G     gb         18089.83 31158.51    HIGH                         2

With the original query of "2016-10-30" , we still get the original results: 原始查询为"2016-10-30" ，我们仍然得到原始结果：

yesterday <- as.Date("2016-10-30")
out <- employ.data %>% group_by(Child) %>%
  mutate(`Consec. Days with Callout`=compute.consec.days(DATE,Callout,yesterday,row_number())) %>%
  filter(DATE == yesterday)
##Source: local data frame [2 x 7]
##Groups: Child [2]
##
##        DATE Parent  Child avg_child_salary  salary Callout Consec. Days with Callout
##      <date> <fctr> <fctr>            <dbl>   <dbl>  <fctr>                     <dbl>
##1 2016-10-30      A     ac           526.27    0.00     LOW                         1
##2 2016-10-30      G     gb         18089.83 6973.54     LOW                         1

按组连续的天数-突变函数Dplyr中的错误

问题描述

1 个解决方案

解决方案1
2 已采纳 2016-12-06 16:30:27

Update to support the case where yesterday is not the last* date in group* 更新以支持昨天不是组中最后日期的情况

按组连续的天数-突变函数Dplyr中的错误

问题描述

1 个解决方案

解决方案1 2 已采纳 2016-12-06 16:30:27

Update to support the case where yesterday is not the last date in group 更新以支持昨天不是组中最后日期的情况

解决方案1
2 已采纳 2016-12-06 16:30:27

Update to support the case where yesterday is not the last* date in group* 更新以支持昨天不是组中最后日期的情况