![](/img/trans.png)
[英]how to conditionally create new column based on the values of a column in one dataframe and the column header names of another dataframe in R
[英]How to aggregate R dataframe of one column based on values of another
我的数据框如下。 (类似,实际上还有更多的行和列)
Gender Energetic Weekly_Apple Weekly_Banana
1 Female 3 No Yes
2 Female 3 No Yes
3 Male 5 No Yes
4 Male 2 No No
5 Female 1 No No
我想要基于汇总“是”响应的简短代码,输出以下内容:
Male Female
Apples 0 0
Bananas 1 2
每个性别吃的苹果数量= 0。 1个男性和2个女性吃苹果。
我尝试了以下方法:
count(original_data, c("Gender","Weekly_Apple"))
count(original_data, c("Gender","Weekly_Banana"))
count(original_data, c("Gender","Weekly_Grape"))
count(original_data, c("Gender","Weekly_PineApple"))
aggregate(x = original_data[c("Weekly_Apple",
"Weekly_Banana",
"Weekly_Grape")],
by = original_data[c("Gender")],
FUN = n())
如NelsonGon所建议,我已将tf1 df1 <- t(df1)
替换为tidyr::crossing(df1)
。
library(dplyr)
df<-data.frame(
Gender=c("Female", "Female", "Male", "Male", "Female"),
Energetic =c(3,3,5,2,1),
Weekly_Apple = c("No", "No", "No", "No", "No"),
Weekly_Banana = c("Yes", "Yes", "Yes", "No", "No"))
df1 <- df %>%
group_by(Gender) %>%
summarise(
Apples = sum(Weekly_Apple=="Yes"),
Bananas = sum(Weekly_Banana =="Yes")
)
df1 <- tidyr::crossing(df1)
一种data.table
可能性可能是:
dcast(variable ~ Gender,
value.var = "value",
fun = function(x) sum(x == "Yes"),
data = melt(df[-2], id.vars = "Gender"))
variable Female Male
1 Weekly_Apple 0 0
2 Weekly_Banana 2 1
您可以使用基数R:
table(reshape(cbind(df,id=1:nrow(df)),3:4,idvar = "id",dir="long",sep="_")[-(2:3)])[,,'Yes']
time
Gender Apple Banana
Female 0 2
Male 0 1
甚至
xtabs(Weekly~time+Gender,transform(reshape(cbind(df,id=1:nrow(df)),3:4,idvar = "id",dir="long",sep="_"),Weekly=Weekly=="Yes"))
Gender
time Female Male
Apple 0 0
Banana 2 1
dplyr-tidyr
替代方案:
df %>%
group_by(Gender) %>%
summarise_at(vars(contains("Weekly")), function(x) sum(x=="Yes")) %>%
tidyr::gather(key, val , -Gender) %>%
tidyr::spread(Gender, val)
# A tibble: 2 x 3
key Female Male
<chr> <int> <int>
1 Weekly_Apple 0 0
2 Weekly_Banana 2 1
数据:
df <- structure(list(Gender = structure(c(1L, 1L, 2L, 2L, 1L), .Label = c("Female",
"Male"), class = "factor"), Energetic = c(3, 3, 5, 2, 1), Weekly_Apple = structure(c(1L,
1L, 1L, 1L, 1L), .Label = "No", class = "factor"), Weekly_Banana = structure(c(2L,
2L, 2L, 1L, 1L), .Label = c("No", "Yes"), class = "factor")), class = "data.frame", row.names = c(NA,
-5L))
带有tapply
另一个base R
版本
t(sapply(names(df)[3:4], function(nm) with(df, tapply(df[[nm]]=="Yes", Gender,sum))))
# Female Male
#Weekly_Apple 0 0
#Weekly_Banana 2 1
或与split
sapply(split(df[3:4], df$Gender), function(x) colSums(x == "Yes"))
或其变化
sapply(split(as.data.frame(df[3:4] == "Yes"), df$Gender), colSums)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.