[英]How to create multiple columns in r dataframe by implementing some query conditions
我有一個類似於以下的數據集:
Age Food_1_1 Food_1_2 Food_1_3 Amount_1_1 Amount_1_2 Amount_1_3
6-9 a b a 2 3 4
6-9 b b c 1 2 3
6-9 c a 4 1
9-10 c c b 1 3 1
9-10 c a b 1 2 1
使用 R,我想通過添加相應的值來獲得以下數據集,其中包含一組新的列 a、b 和 c:
年齡 Food_1_1 Food_1_2 Food_1_3 Amount_1_1 Amount_1_2 Amount_1_3 ab c
6-9 aba 2 3 4 6 3 0 6-9 bb c 1 2 3 0 3 3 6-9 c a 4 1 1 0 4 9-10 c c b 1 3 1 0 1 4 9-10 c ab 1 2 1 2 1 1
注意:我的數據還包含缺失值。 變量 Monday:Wednesday 是因子,變量 Value1:Value3 是數值。 為更清楚起見:“a”列的第一行包含通過 Value1 到 Value3 與 a 相關的所有值的相加(例如 2+4 =6)。
使用基礎 R 的一種方法:
data$id <- 1:nrow(data) # Create a unique id
vlist <- list(grep("day$", names(data)), grep("^Value", names(data)))
d1 <- reshape(data, direction="long", varying=vlist, v.names=c("Day","Value"))
d2 <- aggregate(Value~id+Day, FUN=sum, na.rm=TRUE, data=d1)
d3 <- reshape(d2, direction="wide", v.names="Value", timevar="Day")
d3[is.na(d3)] <- 0
merge(data, d3, by="id", all.x=TRUE)
# id Age Monday Tuesday Wednesday Value1 Value2 Value3 Value.a Value.b Value.c
#1 1 6-9 a b a 2 3 4 6 3 0
#2 2 6-9 b b c 1 2 3 0 3 3
#3 3 6-9 <NA> c a NA 4 1 1 0 4
#4 4 9-10 c c b 1 3 1 0 1 4
#5 5 9-10 c a b 1 2 1 2 1 1
數據:
data <- structure(list(Age = structure(c(1L, 1L, 1L, 2L, 2L), .Label = c("6-9",
"9-10"), class = "factor"), Monday = structure(c(1L, 2L, NA,
3L, 3L), .Label = c("a", "b", "c"), class = "factor"), Tuesday = structure(c(2L,
2L, 3L, 3L, 1L), .Label = c("a", "b", "c"), class = "factor"),
Wednesday = structure(c(1L, 3L, 1L, 2L, 2L), .Label = c("a",
"b", "c"), class = "factor"), Value1 = c(2L, 1L, NA, 1L,
1L), Value2 = c(3L, 2L, 4L, 3L, 2L), Value3 = c(4L, 3L, 1L,
1L, 1L)), class = "data.frame", row.names = c(NA, -5L))
您可以使用以下代碼:
data[] <- lapply(data, as.character)
data$rownumber<-rownames(data)
x<-gather(data[,c(1:4,8)], Day, Letter, Monday:Wednesday) %>% mutate(row2 = rownames(x))
y<-gather(data[,c(1,5:7,8)], Day, Value, Value1:Value3)%>% mutate(row2 = rownames(y))
z<-left_join(x, y, by =c("Age","rownumber", "row2")) %>% group_by(Age, rownumber, Letter) %>% dplyr::summarise(suma = sum(as.numeric(Value), na.rm = T)) %>% mutate(suma = replace_na(suma, 0))
z<-dcast(z, rownumber ~ Letter , value.var="suma") %>% left_join(data, z, by = "rownumber")
z$Var.2<-NULL
z[is.na(z)]<-0
Output:
rownumber a b c Age Monday Tuesday Wednesday Value1 Value2 Value3
1 1 6 3 0 6-9 a b a 2 3 4
2 2 0 3 3 6-9 b b c 1 2 3
3 3 1 0 4 6-9 c a 0 4 1
4 4 0 1 4 9-10 c c b 1 3 1
5 5 2 1 1 9-10 c a b 1 2 1
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.