简体   繁体   English

在 R 中,如果其他两列中的值组合是唯一的,则取多个变量的总和

[英]In R, take sum of multiple variables if combination of values in two other columns are unique

I am trying to expand on the answer to this problem that was solved, Take Sum of a Variable if Combination of Values in Two Other Columns are Unique but because I am new to stack overflow, I can't comment directly on that post so here is my problem:我正在尝试扩展已解决的这个问题的答案, 如果其他两列中的值组合是唯一的,则取一个变量的总和,但是因为我是堆栈溢出的新手,所以我不能直接评论那个帖子所以在这里是我的问题:

I have a dataset like the following but with about 100 columns of binary data as shown in "ani1" and "bni2" columns.我有一个如下所示的数据集,但包含大约 100 列二进制数据,如“ani1”和“bni2”列所示。

Locations <- c("A","A","A","A","B","B","C","C","D", "D","D")
seasons <- c("2", "2", "3", "4","2","3","1","2","2","4","4")
ani1 <- c(1,1,1,1,0,1,1,1,0,1,0)
bni2 <- c(0,0,1,1,1,1,0,1,0,1,1)

df <- data.frame(Locations, seasons, ani1, bni2)

     Locations seasons ani1 bni2
1          A       2    1    0
2          A       2    1    0
3          A       3    1    1
4          A       4    1    1
5          B       2    0    1
6          B       3    1    1
7          C       1    1    0
8          C       2    1    1
9          D       2    0    0
10         D       4    1    1
11         D       4    0    1

I am attempting to sum all the columns based on the location and season, but I want to simplify so I get a total column for column #3 and after for each unique combination of location and season.我试图根据位置和季节对所有列进行求和,但我想简化,所以我得到第 3 列的总列,之后为位置和季节的每个独特组合。 The problem is not all the columns have a 1 value for every combination of location and season and they all have different names.问题不在于所有列对于位置和季节的每种组合都有一个 1 值,并且它们都有不同的名称。

I would like something like this:我想要这样的东西:

    Locations seasons ani1 bni2
1         A       2    2    0
2         A       3    1    1
3         A       4    1    1
4         B       2    0    1
5         B       3    1    1
6         C       1    1    0
7         C       2    1    1
8         D       2    0    0
9         D       4    1    2

Here is my attempt using a for loop:这是我使用 for 循环的尝试:

 df2 <- 0
 for(i in 3:length(df)){
  testdf <- data.frame(t(apply(df[1:2], 1, sort)), df[i])
  df2 <- aggregate(i~., testdf, FUN=sum)
 }

I get the following error:我收到以下错误:

Error in model.frame.default(formula = i ~ ., data = testdf) : 
  variable lengths differ (found for 'X1')

Thank you!谢谢!

You can use dplyr::summarise and across after group_by .您可以使用dplyr::summariseacross group_by之后使用。

library(dplyr)

df %>% 
  group_by(Locations, seasons) %>% 
  summarise(across(starts_with("ani"), ~sum(.x, na.rm = TRUE))) %>%
  ungroup()

Another option is to reshape the data to long format using functions from the tidyr package.另一种选择是使用tidyr package 中的函数将数据重塑为长格式。 This avoids the issue of having to select columns 3 onwards.这避免了必须从 select 第 3 列开始的问题。

library(dplyr)
library(tidyr)

df %>% 
  pivot_longer(cols = -c(Locations, seasons)) %>% 
  group_by(Locations, seasons, name) %>% 
  summarise(Sum = sum(value, na.rm = TRUE)) %>% 
  ungroup() %>% 
  pivot_wider(names_from = "name", values_from = "Sum")

Result:结果:

# A tibble: 9 x 4
  Locations seasons  ani1  ani2
  <chr>       <int> <int> <int>
1 A               2     2     0
2 A               3     1     1
3 A               4     1     1
4 B               2     0     1
5 B               3     1     1
6 C               1     1     0
7 C               2     1     1
8 D               2     0     0
9 D               4     1     2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 r 中列中唯一值组合的总和 - Sum of unique combination of values in columns in r 计算由r中的两个其他列的唯一组合分组的列中的成对值的出现 - Count occurrence of pair wise values in a column grouped by a unique combination of two other columns in r 对 r 中每个唯一变量组合的行求和 - Sum rows of each unique combination of variables in r 按 R 中的其他变量分组时,查找每组具有多个唯一值的列 - Find columns with multiple unique values per group when grouping by other variables in R 从R中的两列获取唯一组合 - Get unique combination from two columns in R 对于两个其他变量与dplyr的每个唯一组合,仅对分组数据框中的变量求和一次 - Sum a variable in a grouped dataframe only once for each unique combination of two other variables with dplyr R Dataframe 中列组合的唯一列值 - Unique column values on a combination of columns in R Dataframe 是否有 R function 可以根据其他两个变量给我一个值的总和? - Is there a R function to give me a sum of values based on two other variables? 显示其他变量组合的不同变量的唯一值数量 - Show number of unique values of different variables for a combination of other variables 使用 R 中的多列组合与所有可能的组合值进行聚合 - Aggregate with combination of multiple columns in R with all possible combination values
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM