簡體   English   中英

如何通過實現一些查詢條件在r dataframe中創建多列

[英]How to create multiple columns in r dataframe by implementing some query conditions

我有一個類似於以下的數據集:

Age    Food_1_1 Food_1_2 Food_1_3  Amount_1_1 Amount_1_2 Amount_1_3
6-9        a        b          a      2          3           4
6-9        b        b          c      1          2           3
6-9                 c          a                 4           1
9-10       c        c          b      1          3           1
9-10       c        a          b      1          2           1

使用 R,我想通過添加相應的值來獲得以下數據集,其中包含一組新的列 a、b 和 c:

年齡 Food_1_1 Food_1_2 Food_1_3 Amount_1_1 Amount_1_2 Amount_1_3 ab c
6-9 aba 2 3 4 6 3 0 6-9 bb c 1 2 3 0 3 3 6-9 c a 4 1 1 0 4 9-10 c c b 1 3 1 0 1 4 9-10 c ab 1 2 1 2 1 1

注意:我的數據還包含缺失值。 變量 Monday:Wednesday 是因子,變量 Value1:Value3 是數值。 為更清楚起見:“a”列的第一行包含通過 Value1 到 Value3 與 a 相關的所有值的相加(例如 2+4 =6)。

使用基礎 R 的一種方法:

data$id <- 1:nrow(data)  # Create a unique id
vlist <- list(grep("day$", names(data)), grep("^Value", names(data)))
d1 <- reshape(data, direction="long", varying=vlist, v.names=c("Day","Value"))
d2 <- aggregate(Value~id+Day, FUN=sum, na.rm=TRUE, data=d1)
d3 <- reshape(d2, direction="wide", v.names="Value", timevar="Day")
d3[is.na(d3)] <- 0
merge(data, d3, by="id", all.x=TRUE)

#  id  Age Monday Tuesday Wednesday Value1 Value2 Value3 Value.a Value.b Value.c
#1  1  6-9      a       b         a      2      3      4       6       3       0
#2  2  6-9      b       b         c      1      2      3       0       3       3
#3  3  6-9   <NA>       c         a     NA      4      1       1       0       4
#4  4 9-10      c       c         b      1      3      1       0       1       4
#5  5 9-10      c       a         b      1      2      1       2       1       1

數據

data <- structure(list(Age = structure(c(1L, 1L, 1L, 2L, 2L), .Label = c("6-9", 
"9-10"), class = "factor"), Monday = structure(c(1L, 2L, NA, 
3L, 3L), .Label = c("a", "b", "c"), class = "factor"), Tuesday = structure(c(2L, 
2L, 3L, 3L, 1L), .Label = c("a", "b", "c"), class = "factor"), 
    Wednesday = structure(c(1L, 3L, 1L, 2L, 2L), .Label = c("a", 
    "b", "c"), class = "factor"), Value1 = c(2L, 1L, NA, 1L, 
    1L), Value2 = c(3L, 2L, 4L, 3L, 2L), Value3 = c(4L, 3L, 1L, 
    1L, 1L)), class = "data.frame", row.names = c(NA, -5L))

您可以使用以下代碼:

data[] <- lapply(data, as.character)
data$rownumber<-rownames(data)
x<-gather(data[,c(1:4,8)], Day, Letter, Monday:Wednesday) %>% mutate(row2 = rownames(x))
y<-gather(data[,c(1,5:7,8)], Day, Value, Value1:Value3)%>% mutate(row2 = rownames(y))
z<-left_join(x, y, by =c("Age","rownumber", "row2")) %>% group_by(Age, rownumber, Letter) %>% dplyr::summarise(suma = sum(as.numeric(Value), na.rm = T)) %>% mutate(suma = replace_na(suma, 0))

z<-dcast(z, rownumber ~ Letter , value.var="suma") %>% left_join(data, z, by  = "rownumber")
z$Var.2<-NULL
z[is.na(z)]<-0

Output:

  rownumber a b c  Age Monday Tuesday Wednesday Value1 Value2 Value3
1         1 6 3 0  6-9      a       b         a      2      3      4
2         2 0 3 3  6-9      b       b         c      1      2      3
3         3 1 0 4  6-9              c         a      0      4      1
4         4 0 1 4 9-10      c       c         b      1      3      1
5         5 2 1 1 9-10      c       a         b      1      2      1

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM