繁体   English   中英

R:计算新的变量R代码

[英]R: Calculating New Variable R Code

我有

            id_1 id_2  name  count total
          1  001  111    a     15  
          2  001  111    b      3   
          3  001  111   sum    28   28
          4  002  111    a      7  
          5  002  111    b     33
          6  002  111   sum    48   48

我希望共享相同id_1和id_2的行共享总数,例如

            id_1 id_2  name   count total
          1  001  111    a     15   28
          2  001  111    b      3   28
          3  001  111   sum    28   28
          4  002  111    a      7   48
          5  002  111    b     33   48
          6  002  111   sum    48   48

我们可以使用来自tidyr fill

library(tidyr)

dat2 <- dat %>% fill(total, .direction = "up")
dat2
#   id_1 id_2 name count total
# 1    1  111    a    15    28
# 2    1  111    b     3    28
# 3    1  111  sum    28    28
# 4    2  111    a     7    48
# 5    2  111    b    33    48
# 6    2  111  sum    48    48

数据

dat <- read.table(text = "            id_1 id_2  name  count total
          1  001  111    a     15   NA
          2  001  111    b      3   NA
          3  001  111   sum    28   28
          4  002  111    a      7   NA
          5  002  111    b     33   NA
          6  002  111   sum    48   48",
                  header = TRUE, stringsAsFactors = FALSE)

考虑基数R的ave计算组maxna.rm来处理NA ):

df$total <- ave(df$total, df$id_1, df$_id_2, FUN=function(i) max(i, na.rm=na.omit))

df
#   id_1 id_2 name count total
# 1    1  111    a    15    28
# 2    1  111    b     3    28
# 3    1  111  sum    28    28
# 4    2  111    a     7    48
# 5    2  111    b    33    48
# 6    2  111  sum    48    48

使用zoodata.table

df <- read.table(text = "id_1 id_2  name  count total
            001  111    a     15  NA
                    001  111    b      3   NA
                    001  111   sum    28   28
                    002  111    a      7  NA
                    002  111    b     33   NA
                    002  111   sum    48   48",
                  header = TRUE, stringsAsFactors = FALSE)# create data
library(zoo)# load packages
library(data.table)
setDT(df)[, total := na.locf(na.locf(total, na.rm=FALSE), na.rm=FALSE, fromLast=TRUE), by = c("id_1", "id_2")]# convert df to data.table and carry forward and backward total by ids

输出:

    id_1 id_2 name count total
1:    1  111    a    15    28
2:    1  111    b     3    28
3:    1  111  sum    28    28
4:    2  111    a     7    48
5:    2  111    b    33    48
6:    2  111  sum    48    48

使用普通dplyr方式的简单方法:

dat %>% group_by(id_1, id_2) %>% mutate(total=count[name == "sum"])

或者:

dat %>% group_by(id_1, id_2) %>% mutate(total=na.omit(total)[1])

   id_1  id_2 name  count total
  <int> <int> <chr> <int> <int>
1     1   111 a        15    28
2     1   111 b         3    28
3     1   111 sum      28    28
4     2   111 a         7    48
5     2   111 b        33    48
6     2   111 sum      48    48

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM