简体   繁体   中英

R: Calculating New Variable R Code

I have

            id_1 id_2  name  count total
          1  001  111    a     15  
          2  001  111    b      3   
          3  001  111   sum    28   28
          4  002  111    a      7  
          5  002  111    b     33
          6  002  111   sum    48   48

I want the rows that share the same id_1 and id_2 to share the total, like

            id_1 id_2  name   count total
          1  001  111    a     15   28
          2  001  111    b      3   28
          3  001  111   sum    28   28
          4  002  111    a      7   48
          5  002  111    b     33   48
          6  002  111   sum    48   48

We can use fill from tidyr .

library(tidyr)

dat2 <- dat %>% fill(total, .direction = "up")
dat2
#   id_1 id_2 name count total
# 1    1  111    a    15    28
# 2    1  111    b     3    28
# 3    1  111  sum    28    28
# 4    2  111    a     7    48
# 5    2  111    b    33    48
# 6    2  111  sum    48    48

DATA

dat <- read.table(text = "            id_1 id_2  name  count total
          1  001  111    a     15   NA
          2  001  111    b      3   NA
          3  001  111   sum    28   28
          4  002  111    a      7   NA
          5  002  111    b     33   NA
          6  002  111   sum    48   48",
                  header = TRUE, stringsAsFactors = FALSE)

Consider base R's ave calculating group max ( na.rm to handle NA ):

df$total <- ave(df$total, df$id_1, df$_id_2, FUN=function(i) max(i, na.rm=na.omit))

df
#   id_1 id_2 name count total
# 1    1  111    a    15    28
# 2    1  111    b     3    28
# 3    1  111  sum    28    28
# 4    2  111    a     7    48
# 5    2  111    b    33    48
# 6    2  111  sum    48    48

Using zoo and data.table :

df <- read.table(text = "id_1 id_2  name  count total
            001  111    a     15  NA
                    001  111    b      3   NA
                    001  111   sum    28   28
                    002  111    a      7  NA
                    002  111    b     33   NA
                    002  111   sum    48   48",
                  header = TRUE, stringsAsFactors = FALSE)# create data
library(zoo)# load packages
library(data.table)
setDT(df)[, total := na.locf(na.locf(total, na.rm=FALSE), na.rm=FALSE, fromLast=TRUE), by = c("id_1", "id_2")]# convert df to data.table and carry forward and backward total by ids

Output:

    id_1 id_2 name count total
1:    1  111    a    15    28
2:    1  111    b     3    28
3:    1  111  sum    28    28
4:    2  111    a     7    48
5:    2  111    b    33    48
6:    2  111  sum    48    48

Simple approach using the normal dplyr way:

dat %>% group_by(id_1, id_2) %>% mutate(total=count[name == "sum"])

Alternatively:

dat %>% group_by(id_1, id_2) %>% mutate(total=na.omit(total)[1])

   id_1  id_2 name  count total
  <int> <int> <chr> <int> <int>
1     1   111 a        15    28
2     1   111 b         3    28
3     1   111 sum      28    28
4     2   111 a         7    48
5     2   111 b        33    48
6     2   111 sum      48    48

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM