I have a data.frame
like this
datdf <- structure(list(BM = rep("1907-01-01", 20),
ct = structure(rep(c(1L, 2L), each = 5, times = 2),
.Label = c("B", "A"), class = "factor"),
val = c(rep(NA, 10), 9901:9910),
facet = rep(c(1, 2), each = 10) ),
row.names = c(NA, -20L),
.Names = c("BM", "ct", "val", "facet"),
class = c("tbl_df", "tbl", "data.frame"))
My problem is following. After making some groupwise mutation (I need cumsum
) I get NA
values in one of the groups. And it's not only cumsum
- any modification of val
throws NA
.
datdf %>% group_by(BM, facet, ct) %>% mutate(v1 = val + 100, v2 = cumsum(val), v3 = val)
# BM ct val facet v1 v2 v3
# (chr) (fctr) (int) (dbl) (dbl) (int) (int)
# 11 1907-01-01 B 9901 2 10001 9901 9901
# 12 1907-01-01 B 9902 2 10002 19803 9902
# 13 1907-01-01 B 9903 2 10003 29706 9903
# 14 1907-01-01 B 9904 2 10004 39610 9904
# 15 1907-01-01 B 9905 2 10005 49515 9905
# 16 1907-01-01 A 9906 2 NA NA 9906
# 17 1907-01-01 A 9907 2 NA NA 9907
# 18 1907-01-01 A 9908 2 NA NA 9908
# 19 1907-01-01 A 9909 2 NA NA 9909
# 20 1907-01-01 A 9910 2 NA NA 9910
My dplyr
version is 0.4.3, R
is 3.1.3
Is it a bug or am I missing something? I remember not having this issue with dplyr 0.4.1
before having updated it some weeks ago.
How can I fix it now?
A workaround is to use the function mapvalues
from plyr
to replace NAs by zeros:
Just for the v2
(cumsum column):
library(plyr)
datdf %>% mutate(v1 = val + 100,
v2 = cumsum(val %>% mapvalues(NA, 0)),
v3 = val)
Output:
BM ct val facet v1 v2 v3
(chr) (fctr) (int) (dbl) (dbl) (dbl) (int)
1 1907-01-01 B NA 1 NA 0 NA
2 1907-01-01 B NA 1 NA 0 NA
3 1907-01-01 B NA 1 NA 0 NA
4 1907-01-01 B NA 1 NA 0 NA
5 1907-01-01 B NA 1 NA 0 NA
6 1907-01-01 A NA 1 NA 0 NA
7 1907-01-01 A NA 1 NA 0 NA
8 1907-01-01 A NA 1 NA 0 NA
9 1907-01-01 A NA 1 NA 0 NA
10 1907-01-01 A NA 1 NA 0 NA
11 1907-01-01 B 9901 2 10001 9901 9901
12 1907-01-01 B 9902 2 10002 19803 9902
13 1907-01-01 B 9903 2 10003 29706 9903
14 1907-01-01 B 9904 2 10004 39610 9904
15 1907-01-01 B 9905 2 10005 49515 9905
16 1907-01-01 A 9906 2 10006 59421 9906
17 1907-01-01 A 9907 2 10007 69328 9907
18 1907-01-01 A 9908 2 10008 79236 9908
19 1907-01-01 A 9909 2 10009 89145 9909
20 1907-01-01 A 9910 2 10010 99055 9910
For all columns:
datdf %>% mutate(v1 = val %>% mapvalues(NA, 0) + 100,
v2 = cumsum(val %>% mapvalues(NA, 0)),
v3 = val %>% mapvalues(NA, 0))
Output:
BM ct val facet v1 v2 v3
(chr) (fctr) (int) (dbl) (dbl) (dbl) (dbl)
1 1907-01-01 B NA 1 100 0 0
2 1907-01-01 B NA 1 100 0 0
3 1907-01-01 B NA 1 100 0 0
4 1907-01-01 B NA 1 100 0 0
5 1907-01-01 B NA 1 100 0 0
6 1907-01-01 A NA 1 100 0 0
7 1907-01-01 A NA 1 100 0 0
8 1907-01-01 A NA 1 100 0 0
9 1907-01-01 A NA 1 100 0 0
10 1907-01-01 A NA 1 100 0 0
11 1907-01-01 B 9901 2 10001 9901 9901
12 1907-01-01 B 9902 2 10002 19803 9902
13 1907-01-01 B 9903 2 10003 29706 9903
14 1907-01-01 B 9904 2 10004 39610 9904
15 1907-01-01 B 9905 2 10005 49515 9905
16 1907-01-01 A 9906 2 10006 59421 9906
17 1907-01-01 A 9907 2 10007 69328 9907
18 1907-01-01 A 9908 2 10008 79236 9908
19 1907-01-01 A 9909 2 10009 89145 9909
20 1907-01-01 A 9910 2 10010 99055 9910
Maybe you ran into some of this issues: https://github.com/hadley/dplyr/issues/1448#issuecomment-150037548
try this:
datdf %>% group_by(BM, facet,ct) %>% plyr::mutate(v1 = val + 100, v2 = cumsum(val[!is.na(val)]), v3 = val)
BM ct val facet v1 v2 v3
(chr) (fctr) (int) (dbl) (dbl) (int) (int)
11 1907-01-01 B 9901 2 10001 9901 9901
12 1907-01-01 B 9902 2 10002 19803 9902
13 1907-01-01 B 9903 2 10003 29706 9903
14 1907-01-01 B 9904 2 10004 39610 9904
15 1907-01-01 B 9905 2 10005 49515 9905
16 1907-01-01 A 9906 2 10006 59421 9906
17 1907-01-01 A 9907 2 10007 69328 9907
18 1907-01-01 A 9908 2 10008 79236 9908
19 1907-01-01 A 9909 2 10009 89145 9909
20 1907-01-01 A 9910 2 10010 99055 9910
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.