[英]Consecutive Days by Group - Error in Mutate Function Dplyr
這是以下問題的延續: 在R中按組記錄連續天數
答案適用於我發布的示例中的數據集,但我意識到我的實際數據集有問題,並且出現了一個錯誤,指出: Error: incompatible size (0), expecting 1 (the group size) or 1
下面是出現錯誤的數據集和可復制示例。 有人知道為什么會這樣嗎?
DATE <- as.Date(c('2016-10-26', '2016-10-30', '2016-10-26', '2016-10-20', '2016-10-21', '2016-10-17', '2016-10-26', '2016-10-17', '2016-10-18', '2016-10-20', '2016-10-17', '2016-10-18', '2016-10-17', '2016-10-18', '2016-10-19','2016-10-18', '2016-10-19','2016-10-17','2016-10-17','2016-10-19','2016-10-19','2016-10-20','2016-10-19','2016-10-20','2016-10-30'))
`Parent` <- c('A','A','A','A','A','A','A','B', 'B', 'B', 'C', 'C', 'D', 'D', 'D', 'D', 'D', 'E', 'E', 'F', 'G', 'G', 'G', 'G', 'G')
Child <- c('ab', 'ac', 'ad', 'ae', 'ae','af', 'af','ba', 'ba', 'ba', 'ca', 'cb', 'da', 'da', 'da', 'db', 'db', 'ea', 'eb', 'fa', 'ga', 'ga', 'gb', 'gb', 'gb')
salary <- c(290.45, 0.00, 336.51, 2238.56, 2256.75, 725.73, 319.69, 46.48, 42.13, 43.22, 0.41, 865.20, 1889.80, 2691.97, 3016.80, 8636.18, 8540.24, 1587.21, 1416.63, 79.62,1967.95,1947.35,34925.58,31158.51,6973.54)
avg_child_salary <- c(500.29, 526.27, 492.00, 1197.25, 1197.25, 474.10, 474.10, 21.68, 21.68, 21.68, 0.05, 199.90, 575.31, 575.31, 575.31, 1701.82, 1701.82, 495.48, 316.93, 26.16, 582.66, 582.66, 18089.83, 18089.83, 18089.83)
Callout <- c('LOW', 'LOW', 'LOW', 'HIGH', 'HIGH', 'HIGH', 'LOW', 'HIGH', 'HIGH', 'HIGH', 'HIGH', 'HIGH', 'HIGH', 'HIGH', 'HIGH', 'HIGH', 'HIGH', 'HIGH', 'HIGH', 'HIGH', 'HIGH', 'HIGH', 'HIGH', 'HIGH', 'LOW')
employ.data <- data.frame(DATE, Parent, Child, avg_child_salary, salary, Callout)
employ.data
DATE Parent Child avg_child_salary salary Callout
1 2016-10-26 A ab 500.29 290.45 LOW
2 2016-10-30 A ac 526.27 0.00 LOW
3 2016-10-26 A ad 492.00 336.51 LOW
4 2016-10-20 A ae 1197.25 2238.56 HIGH
5 2016-10-21 A ae 1197.25 2256.75 HIGH
6 2016-10-17 A af 474.10 725.73 HIGH
7 2016-10-26 A af 474.10 319.69 LOW
8 2016-10-17 B ba 21.68 46.48 HIGH
9 2016-10-18 B ba 21.68 42.13 HIGH
10 2016-10-20 B ba 21.68 43.22 HIGH
11 2016-10-17 C ca 0.05 0.41 HIGH
12 2016-10-18 C cb 199.90 865.20 HIGH
13 2016-10-17 D da 575.31 1889.80 HIGH
14 2016-10-18 D da 575.31 2691.97 HIGH
15 2016-10-19 D da 575.31 3016.80 HIGH
16 2016-10-18 D db 1701.82 8636.18 HIGH
17 2016-10-19 D db 1701.82 8540.24 HIGH
18 2016-10-17 E ea 495.48 1587.21 HIGH
19 2016-10-17 E eb 316.93 1416.63 HIGH
20 2016-10-19 F fa 26.16 79.62 HIGH
21 2016-10-19 G ga 582.66 1967.95 HIGH
22 2016-10-20 G ga 582.66 1947.35 HIGH
23 2016-10-19 G gb 18089.83 34925.58 HIGH
24 2016-10-20 G gb 18089.83 31158.51 HIGH
25 2016-10-30 G gb 18089.83 6973.54 LOW
然后,我要從該數據集中收集包含2016-10-30
所有行,然后在單獨的列中,根據employee.data數據幀,用LOW
或HIGH
標注計數連續的天數。 連續天數必須在“標注”旁邊的新列中。 這是在應用錯誤的腳本之前:
yesterday <- as.Date(Sys.Date()-37)
df2<-filter(employ.data, DATE == yesterday)
df2
DATE Parent Child avg_child_salary salary Callout
2 2016-10-30 A ac 526.27 0.00 LOW
25 2016-10-30 G gb 18089.83 6973.54 LOW
嘗試的代碼如下:
library(dplyr)
yesterday <- as.Date(Sys.Date()-37) ##because today is 12/6/16
df2 <- employ.data %>% group_by(Child) %>%
mutate(`Consec. Days with Callout`=cumsum(rev(cumprod(rev((yesterday-DATE)==(which(DATE == yesterday)-row_number()) & Callout==Callout[DATE == yesterday]))))) %>% filter(DATE == yesterday)
最后,對於此特定示例,它需要看起來像這樣:
DATE Parent Child avg_child_salary salary Callout Consec. Days with Callout
2 2016-10-30 A ac 526.27 0.00 LOW 1
25 2016-10-30 G gb 18089.83 6973.54 LOW 1
然后出現錯誤:
Error: incompatible size (0), expecting 1 (the group size) or 1
問題是,對於某些組,找不到yesterday
的行。 可以通過定義一個檢查該功能的函數來解決此問題,而不是在mutate
中內聯該函數:
library(dplyr)
compute.consec.days <- function(date, callout, yesterday, rown) {
j <- which(date == yesterday)
if (length(j)==0) NA else cumsum(rev(cumprod(rev((yesterday-date)==(j-rown) & callout==callout[date == yesterday]))))
}
該函數檢查which
DATE
是yesterday
。 如果找不到該組,則將返回integer(0)
。 我們通過返回值j
的length
進行檢查。 如果為TRUE
,則連續兩天返回NA
,這沒有關系,因為以下filter
將刪除該組(即找不到yesterday
); 否則,我們將像以前一樣計算連續的天數。 這樣可以避免錯誤。 現在,使用此功能和您新發布的數據:
yesterday <- as.Date("2016-10-30")
out <- employ.data %>% group_by(Child) %>%
mutate(`Consec. Days with Callout`=compute.consec.days(DATE,Callout,yesterday,row_number())) %>%
filter(DATE == yesterday)
##Source: local data frame [2 x 7]
##Groups: Child [2]
##
## DATE Parent Child avg_child_salary salary Callout Consec. Days with Callout
## <date> <fctr> <fctr> <dbl> <dbl> <fctr> <dbl>
##1 2016-10-30 A ac 526.27 0.00 LOW 1
##2 2016-10-30 G gb 18089.83 6973.54 LOW 1
如果查詢yesterday
不是任何的最后一天, Child
組,那么我們需要修改我們的compute.consec.days
功能,例如:
compute.consec.days <- function(date, callout, yesterday, rown) {
j <- which(date == yesterday)
if (length(j)==0) NA else {
## first compute the condition
cond <- (yesterday-date)==(j-rown) & callout==callout[date == yesterday]
## then evaluate consecutive days only with this vector up to
## the row corresponding to yesterday. Then add the result with NAs
## because mutate is a windowing function
c(cumsum(rev(cumprod(rev(cond[1:j[1]])))),rep(NA,length(date)-j[1]))
}
}
例如,如果給定新發布的數據,昨天的查詢為"2016-10-20"
,則結果為:
yesterday <- as.Date("2016-10-20")
out <- employ.data %>% group_by(Child) %>%
mutate(`Consec. Days with Callout`=compute.consec.days(DATE,Callout,yesterday,row_number())) %>%
filter(DATE == yesterday)
##Source: local data frame [4 x 7]
##Groups: Child [4]
##
## DATE Parent Child avg_child_salary salary Callout Consec. Days with Callout
## <date> <fctr> <fctr> <dbl> <dbl> <fctr> <dbl>
##1 2016-10-20 A ae 1197.25 2238.56 HIGH 1
##2 2016-10-20 B ba 21.68 43.22 HIGH 1
##3 2016-10-20 G ga 582.66 1947.35 HIGH 2
##4 2016-10-20 G gb 18089.83 31158.51 HIGH 2
原始查詢為"2016-10-30"
,我們仍然得到原始結果:
yesterday <- as.Date("2016-10-30")
out <- employ.data %>% group_by(Child) %>%
mutate(`Consec. Days with Callout`=compute.consec.days(DATE,Callout,yesterday,row_number())) %>%
filter(DATE == yesterday)
##Source: local data frame [2 x 7]
##Groups: Child [2]
##
## DATE Parent Child avg_child_salary salary Callout Consec. Days with Callout
## <date> <fctr> <fctr> <dbl> <dbl> <fctr> <dbl>
##1 2016-10-30 A ac 526.27 0.00 LOW 1
##2 2016-10-30 G gb 18089.83 6973.54 LOW 1
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.