簡體   English   中英

dplyr group_by +突變出現奇怪的NA

[英]dplyr group_by + mutate strange NA appearing

我有一個這樣的data.frame

datdf  <- structure(list(BM = rep("1907-01-01", 20), 
                         ct = structure(rep(c(1L, 2L), each = 5, times = 2), 
                                        .Label = c("B", "A"), class = "factor"), 
                         val = c(rep(NA, 10), 9901:9910), 
                         facet = rep(c(1, 2), each = 10) ), 
                    row.names = c(NA, -20L), 
                    .Names = c("BM", "ct", "val", "facet"), 
                    class = c("tbl_df", "tbl", "data.frame"))

我的問題如下。 在進行一些逐組突變(我需要cumsum )后,我得到其中一組的NA值。 而且這不僅cumsum -的任何修改val拋出NA

datdf %>% group_by(BM, facet, ct) %>% mutate(v1 = val + 100, v2 = cumsum(val), v3 = val)

#            BM     ct   val facet    v1    v2    v3
#         (chr) (fctr) (int) (dbl) (dbl) (int) (int)
# 11 1907-01-01      B  9901     2 10001  9901  9901
# 12 1907-01-01      B  9902     2 10002 19803  9902
# 13 1907-01-01      B  9903     2 10003 29706  9903
# 14 1907-01-01      B  9904     2 10004 39610  9904
# 15 1907-01-01      B  9905     2 10005 49515  9905
# 16 1907-01-01      A  9906     2    NA    NA  9906
# 17 1907-01-01      A  9907     2    NA    NA  9907
# 18 1907-01-01      A  9908     2    NA    NA  9908
# 19 1907-01-01      A  9909     2    NA    NA  9909
# 20 1907-01-01      A  9910     2    NA    NA  9910

我的dplyr版本是0.4.3, R是3.1.3

是錯誤還是我錯過了什么? 我記得dplyr 0.4.1更新dplyr 0.4.1沒有遇到此問題。

我現在該如何解決?

一種解決方法是使用功能mapvaluesplyr由零替換來港:

僅針對v2 (“總和”列):

library(plyr)   
datdf %>%  mutate(v1 = val + 100, 
                       v2 = cumsum(val %>% mapvalues(NA, 0)), 
                       v3 = val)

輸出:

           BM     ct   val facet    v1    v2    v3
        (chr) (fctr) (int) (dbl) (dbl) (dbl) (int)
1  1907-01-01      B    NA     1    NA     0    NA
2  1907-01-01      B    NA     1    NA     0    NA
3  1907-01-01      B    NA     1    NA     0    NA
4  1907-01-01      B    NA     1    NA     0    NA
5  1907-01-01      B    NA     1    NA     0    NA
6  1907-01-01      A    NA     1    NA     0    NA
7  1907-01-01      A    NA     1    NA     0    NA
8  1907-01-01      A    NA     1    NA     0    NA
9  1907-01-01      A    NA     1    NA     0    NA
10 1907-01-01      A    NA     1    NA     0    NA
11 1907-01-01      B  9901     2 10001  9901  9901
12 1907-01-01      B  9902     2 10002 19803  9902
13 1907-01-01      B  9903     2 10003 29706  9903
14 1907-01-01      B  9904     2 10004 39610  9904
15 1907-01-01      B  9905     2 10005 49515  9905
16 1907-01-01      A  9906     2 10006 59421  9906
17 1907-01-01      A  9907     2 10007 69328  9907
18 1907-01-01      A  9908     2 10008 79236  9908
19 1907-01-01      A  9909     2 10009 89145  9909
20 1907-01-01      A  9910     2 10010 99055  9910

對於所有列:

datdf %>%   mutate(v1 = val  %>% mapvalues(NA, 0) + 100, 
                   v2 = cumsum(val %>% mapvalues(NA, 0)), 
                   v3 = val %>% mapvalues(NA, 0))

輸出:

           BM     ct   val facet    v1    v2    v3
        (chr) (fctr) (int) (dbl) (dbl) (dbl) (dbl)
1  1907-01-01      B    NA     1   100     0     0
2  1907-01-01      B    NA     1   100     0     0
3  1907-01-01      B    NA     1   100     0     0
4  1907-01-01      B    NA     1   100     0     0
5  1907-01-01      B    NA     1   100     0     0
6  1907-01-01      A    NA     1   100     0     0
7  1907-01-01      A    NA     1   100     0     0
8  1907-01-01      A    NA     1   100     0     0
9  1907-01-01      A    NA     1   100     0     0
10 1907-01-01      A    NA     1   100     0     0
11 1907-01-01      B  9901     2 10001  9901  9901
12 1907-01-01      B  9902     2 10002 19803  9902
13 1907-01-01      B  9903     2 10003 29706  9903
14 1907-01-01      B  9904     2 10004 39610  9904
15 1907-01-01      B  9905     2 10005 49515  9905
16 1907-01-01      A  9906     2 10006 59421  9906
17 1907-01-01      A  9907     2 10007 69328  9907
18 1907-01-01      A  9908     2 10008 79236  9908
19 1907-01-01      A  9909     2 10009 89145  9909
20 1907-01-01      A  9910     2 10010 99055  9910

也許您遇到了一些這樣的問題: https : //github.com/hadley/dplyr/issues/1448#issuecomment-150037548

嘗試這個:

datdf %>% group_by(BM, facet,ct) %>% plyr::mutate(v1 = val + 100, v2 = cumsum(val[!is.na(val)]), v3 = val)

               BM     ct   val facet    v1    v2    v3
            (chr) (fctr) (int) (dbl) (dbl) (int) (int)
    11 1907-01-01      B  9901     2 10001  9901  9901
    12 1907-01-01      B  9902     2 10002 19803  9902
    13 1907-01-01      B  9903     2 10003 29706  9903
    14 1907-01-01      B  9904     2 10004 39610  9904
    15 1907-01-01      B  9905     2 10005 49515  9905
    16 1907-01-01      A  9906     2 10006 59421  9906
    17 1907-01-01      A  9907     2 10007 69328  9907
    18 1907-01-01      A  9908     2 10008 79236  9908
    19 1907-01-01      A  9909     2 10009 89145  9909
    20 1907-01-01      A  9910     2 10010 99055  9910

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM