[英]In R, take sum of multiple variables if combination of values in two other columns are unique
I am trying to expand on the answer to this problem that was solved, Take Sum of a Variable if Combination of Values in Two Other Columns are Unique but because I am new to stack overflow, I can't comment directly on that post so here is my problem:我正在尝试扩展已解决的这个问题的答案, 如果其他两列中的值组合是唯一的,则取一个变量的总和,但是因为我是堆栈溢出的新手,所以我不能直接评论那个帖子所以在这里是我的问题:
I have a dataset like the following but with about 100 columns of binary data as shown in "ani1" and "bni2" columns.我有一个如下所示的数据集,但包含大约 100 列二进制数据,如“ani1”和“bni2”列所示。
Locations <- c("A","A","A","A","B","B","C","C","D", "D","D")
seasons <- c("2", "2", "3", "4","2","3","1","2","2","4","4")
ani1 <- c(1,1,1,1,0,1,1,1,0,1,0)
bni2 <- c(0,0,1,1,1,1,0,1,0,1,1)
df <- data.frame(Locations, seasons, ani1, bni2)
Locations seasons ani1 bni2
1 A 2 1 0
2 A 2 1 0
3 A 3 1 1
4 A 4 1 1
5 B 2 0 1
6 B 3 1 1
7 C 1 1 0
8 C 2 1 1
9 D 2 0 0
10 D 4 1 1
11 D 4 0 1
I am attempting to sum all the columns based on the location and season, but I want to simplify so I get a total column for column #3 and after for each unique combination of location and season.我试图根据位置和季节对所有列进行求和,但我想简化,所以我得到第 3 列的总列,之后为位置和季节的每个独特组合。 The problem is not all the columns have a 1 value for every combination of location and season and they all have different names.
问题不在于所有列对于位置和季节的每种组合都有一个 1 值,并且它们都有不同的名称。
I would like something like this:我想要这样的东西:
Locations seasons ani1 bni2
1 A 2 2 0
2 A 3 1 1
3 A 4 1 1
4 B 2 0 1
5 B 3 1 1
6 C 1 1 0
7 C 2 1 1
8 D 2 0 0
9 D 4 1 2
Here is my attempt using a for loop:这是我使用 for 循环的尝试:
df2 <- 0
for(i in 3:length(df)){
testdf <- data.frame(t(apply(df[1:2], 1, sort)), df[i])
df2 <- aggregate(i~., testdf, FUN=sum)
}
I get the following error:我收到以下错误:
Error in model.frame.default(formula = i ~ ., data = testdf) :
variable lengths differ (found for 'X1')
Thank you!谢谢!
You can use dplyr::summarise
and across
after group_by
.您可以使用
dplyr::summarise
并across
group_by
之后使用。
library(dplyr)
df %>%
group_by(Locations, seasons) %>%
summarise(across(starts_with("ani"), ~sum(.x, na.rm = TRUE))) %>%
ungroup()
Another option is to reshape the data to long format using functions from the tidyr
package.另一种选择是使用
tidyr
package 中的函数将数据重塑为长格式。 This avoids the issue of having to select columns 3 onwards.这避免了必须从 select 第 3 列开始的问题。
library(dplyr)
library(tidyr)
df %>%
pivot_longer(cols = -c(Locations, seasons)) %>%
group_by(Locations, seasons, name) %>%
summarise(Sum = sum(value, na.rm = TRUE)) %>%
ungroup() %>%
pivot_wider(names_from = "name", values_from = "Sum")
Result:结果:
# A tibble: 9 x 4
Locations seasons ani1 ani2
<chr> <int> <int> <int>
1 A 2 2 0
2 A 3 1 1
3 A 4 1 1
4 B 2 0 1
5 B 3 1 1
6 C 1 1 0
7 C 2 1 1
8 D 2 0 0
9 D 4 1 2
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.