简体   繁体   English

按组连续的天数-突变函数Dplyr中的错误

[英]Consecutive Days by Group - Error in Mutate Function Dplyr

This is a continuation from the question: Record Consecutive Days by Group in R 这是以下问题的延续: 在R中按组记录连续天数

The answer worked for the dataset in the example I posted but I realized there was something wrong with my actual dataset and an error came up stating: Error: incompatible size (0), expecting 1 (the group size) or 1 答案适用于我发布的示例中的数据集,但我意识到我的实际数据集有问题,并且出现了一个错误,指出: Error: incompatible size (0), expecting 1 (the group size) or 1

Below is the dataset and reproducible example where the error comes up. 下面是出现错误的数据集和可复制示例。 Anybody know why this is happening? 有人知道为什么会这样吗?

DATE <- as.Date(c('2016-10-26', '2016-10-30', '2016-10-26', '2016-10-20', '2016-10-21', '2016-10-17', '2016-10-26', '2016-10-17', '2016-10-18', '2016-10-20', '2016-10-17', '2016-10-18', '2016-10-17', '2016-10-18', '2016-10-19','2016-10-18', '2016-10-19','2016-10-17','2016-10-17','2016-10-19','2016-10-19','2016-10-20','2016-10-19','2016-10-20','2016-10-30'))
`Parent` <- c('A','A','A','A','A','A','A','B', 'B', 'B', 'C', 'C', 'D', 'D', 'D', 'D', 'D', 'E', 'E', 'F', 'G', 'G', 'G', 'G', 'G')
Child <- c('ab', 'ac', 'ad', 'ae', 'ae','af', 'af','ba', 'ba', 'ba', 'ca', 'cb', 'da', 'da', 'da', 'db', 'db', 'ea', 'eb', 'fa', 'ga', 'ga', 'gb', 'gb', 'gb')
salary <- c(290.45, 0.00, 336.51, 2238.56, 2256.75, 725.73, 319.69, 46.48, 42.13, 43.22, 0.41, 865.20, 1889.80, 2691.97, 3016.80, 8636.18, 8540.24, 1587.21, 1416.63, 79.62,1967.95,1947.35,34925.58,31158.51,6973.54)
avg_child_salary <- c(500.29, 526.27, 492.00, 1197.25, 1197.25, 474.10, 474.10, 21.68, 21.68, 21.68, 0.05, 199.90, 575.31, 575.31, 575.31, 1701.82, 1701.82, 495.48, 316.93, 26.16, 582.66, 582.66, 18089.83, 18089.83, 18089.83)
Callout <- c('LOW', 'LOW', 'LOW', 'HIGH', 'HIGH', 'HIGH', 'LOW', 'HIGH', 'HIGH', 'HIGH', 'HIGH', 'HIGH', 'HIGH', 'HIGH', 'HIGH', 'HIGH', 'HIGH', 'HIGH', 'HIGH', 'HIGH', 'HIGH', 'HIGH', 'HIGH', 'HIGH', 'LOW')
employ.data <- data.frame(DATE, Parent, Child, avg_child_salary, salary, Callout)

employ.data

         DATE Parent Child avg_child_salary   salary Callout
1  2016-10-26      A    ab           500.29   290.45     LOW
2  2016-10-30      A    ac           526.27     0.00     LOW
3  2016-10-26      A    ad           492.00   336.51     LOW
4  2016-10-20      A    ae          1197.25  2238.56    HIGH
5  2016-10-21      A    ae          1197.25  2256.75    HIGH
6  2016-10-17      A    af           474.10   725.73    HIGH
7  2016-10-26      A    af           474.10   319.69     LOW
8  2016-10-17      B    ba            21.68    46.48    HIGH
9  2016-10-18      B    ba            21.68    42.13    HIGH
10 2016-10-20      B    ba            21.68    43.22    HIGH
11 2016-10-17      C    ca             0.05     0.41    HIGH
12 2016-10-18      C    cb           199.90   865.20    HIGH
13 2016-10-17      D    da           575.31  1889.80    HIGH
14 2016-10-18      D    da           575.31  2691.97    HIGH
15 2016-10-19      D    da           575.31  3016.80    HIGH
16 2016-10-18      D    db          1701.82  8636.18    HIGH
17 2016-10-19      D    db          1701.82  8540.24    HIGH
18 2016-10-17      E    ea           495.48  1587.21    HIGH
19 2016-10-17      E    eb           316.93  1416.63    HIGH
20 2016-10-19      F    fa            26.16    79.62    HIGH
21 2016-10-19      G    ga           582.66  1967.95    HIGH
22 2016-10-20      G    ga           582.66  1947.35    HIGH
23 2016-10-19      G    gb         18089.83 34925.58    HIGH
24 2016-10-20      G    gb         18089.83 31158.51    HIGH
25 2016-10-30      G    gb         18089.83  6973.54     LOW

Then from this dataset I want to gather all the rows containing 2016-10-30 and then in a separate column, count the number of consecutive days with a callout of LOW or HIGH based on the employ.data dataframe. 然后,我要从该数据集中收集包含2016-10-30所有行,然后在单独的列中,根据employee.data数据帧,用LOWHIGH标注计数连续的天数。 The number of consecutive days needs to be in a new column next to Callout. 连续天数必须在“标注”旁边的新列中。 This is before applying the errored script: 这是在应用错误的脚本之前:

yesterday <- as.Date(Sys.Date()-37)
df2<-filter(employ.data, DATE == yesterday)
df2 

         DATE Parent Child avg_child_salary   salary Callout  
2  2016-10-30      A    ac           526.27     0.00     LOW                          
25 2016-10-30      G    gb         18089.83  6973.54     LOW                          

The code that was attempted is below: 尝试的代码如下:

library(dplyr)
yesterday <- as.Date(Sys.Date()-37) ##because today is 12/6/16
df2 <- employ.data %>% group_by(Child) %>%
  mutate(`Consec. Days with Callout`=cumsum(rev(cumprod(rev((yesterday-DATE)==(which(DATE == yesterday)-row_number()) & Callout==Callout[DATE == yesterday]))))) %>% filter(DATE == yesterday)

In the end it needs to look like this for this particular example: 最后,对于此特定示例,它需要看起来像这样:

         DATE Parent Child avg_child_salary   salary Callout  Consec. Days with Callout
2  2016-10-30      A    ac           526.27     0.00     LOW                          1
25 2016-10-30      G    gb         18089.83  6973.54     LOW                          1

Then the error comes up: 然后出现错误:

Error: incompatible size (0), expecting 1 (the group size) or 1

The issue is that for some groups, the row for yesterday is not found. 问题是,对于某些组,找不到yesterday的行。 This can be fixed by defining a function that checks for that instead of inlining the function in mutate : 可以通过定义一个检查该功能的函数来解决此问题,而不是在mutate中内联该函数:

library(dplyr)
compute.consec.days <- function(date, callout, yesterday, rown) {
  j <- which(date == yesterday)
  if (length(j)==0) NA else cumsum(rev(cumprod(rev((yesterday-date)==(j-rown) & callout==callout[date == yesterday]))))
}

This function checks which DATE is yesterday . 该函数检查which DATEyesterday If not found for the group, then this will return integer(0) . 如果找不到该组,则将返回integer(0) We check this by the length of the return value j . 我们通过返回值jlength进行检查。 If this is TRUE , we return NA for the consecutive days, which does not matter since the following filter will remove that group (ie, yesterday is not found); 如果为TRUE ,则连续两天返回NA ,这没有关系,因为以下filter将删除该组(即找不到yesterday ); otherwise, we compute the consecutive days as before. 否则,我们将像以前一样计算连续的天数。 This avoids the error. 这样可以避免错误。 Now, with this function and your newly posted data: 现在,使用此功能和您新发布的数据:

yesterday <- as.Date("2016-10-30")
out <- employ.data %>% group_by(Child) %>%
  mutate(`Consec. Days with Callout`=compute.consec.days(DATE,Callout,yesterday,row_number())) %>%
  filter(DATE == yesterday)
##Source: local data frame [2 x 7]
##Groups: Child [2]
##
##        DATE Parent  Child avg_child_salary  salary Callout Consec. Days with Callout
##      <date> <fctr> <fctr>            <dbl>   <dbl>  <fctr>                     <dbl>
##1 2016-10-30      A     ac           526.27    0.00     LOW                         1
##2 2016-10-30      G     gb         18089.83 6973.54     LOW                         1

Update to support the case where yesterday is not the last date in group 更新以支持昨天不是组中最后日期的情况

If the query for yesterday is not the last day for any of the Child groups, then we need to modify our compute.consec.days function as such: 如果查询yesterday不是任何的最后一天, Child组,那么我们需要修改我们的compute.consec.days功能,例如:

compute.consec.days <- function(date, callout, yesterday, rown) {
  j <- which(date == yesterday)
  if (length(j)==0) NA else {
    ## first compute the condition
    cond <- (yesterday-date)==(j-rown) & callout==callout[date == yesterday]
    ## then evaluate consecutive days only with this vector up to
    ## the row corresponding to yesterday. Then add the result with NAs
    ## because mutate is a windowing function
    c(cumsum(rev(cumprod(rev(cond[1:j[1]])))),rep(NA,length(date)-j[1]))
  }
}

For example, if the query for yesterday is "2016-10-20" given the newly posted data, then this results in: 例如,如果给定新发布的数据,昨天的查询为"2016-10-20" ,则结果为:

yesterday <- as.Date("2016-10-20")
out <- employ.data %>% group_by(Child) %>%
  mutate(`Consec. Days with Callout`=compute.consec.days(DATE,Callout,yesterday,row_number())) %>%
  filter(DATE == yesterday)
##Source: local data frame [4 x 7]
##Groups: Child [4]
##
##        DATE Parent  Child avg_child_salary   salary Callout Consec. Days with Callout
##      <date> <fctr> <fctr>            <dbl>    <dbl>  <fctr>                     <dbl>
##1 2016-10-20      A     ae          1197.25  2238.56    HIGH                         1
##2 2016-10-20      B     ba            21.68    43.22    HIGH                         1
##3 2016-10-20      G     ga           582.66  1947.35    HIGH                         2
##4 2016-10-20      G     gb         18089.83 31158.51    HIGH                         2

With the original query of "2016-10-30" , we still get the original results: 原始查询为"2016-10-30" ,我们仍然得到原始结果:

yesterday <- as.Date("2016-10-30")
out <- employ.data %>% group_by(Child) %>%
  mutate(`Consec. Days with Callout`=compute.consec.days(DATE,Callout,yesterday,row_number())) %>%
  filter(DATE == yesterday)
##Source: local data frame [2 x 7]
##Groups: Child [2]
##
##        DATE Parent  Child avg_child_salary  salary Callout Consec. Days with Callout
##      <date> <fctr> <fctr>            <dbl>   <dbl>  <fctr>                     <dbl>
##1 2016-10-30      A     ac           526.27    0.00     LOW                         1
##2 2016-10-30      G     gb         18089.83 6973.54     LOW                         1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM