简体   繁体   English

根据给定列中的公共值聚合 R 中相同 data.frame 的多行

[英]Aggregate multiple rows of the same data.frame in R based on common values in given columns

I have a data.frame that looks like this:我有一个data.frame

# set example data
df <- read.table(textConnection("item\tsize\tweight\tvalue
A\t2\t3\t4
A\t2\t3\t6
B\t1\t2\t3
C\t3\t2\t1
B\t1\t2\t4
B\t1\t2\t2"), header = TRUE)

# print example data
df
  item size weight value
1    A    2      3     4
2    A    2      3     6
3    B    1      2     3
4    C    3      2     1
5    B    1      2     4
6    B    1      2     2

As you can see the size and weight columns do not add any complexity since they are the same for each item .如您所见, sizeweight列不会增加任何复杂性,因为它们对于每个item都是相同的。 However, there can be multiple value s for the same item .但是,同一个item可以有多个value

I want to collapse the data.frame to have one row per item using the mean value :我要崩溃了data.frame有每一个行item利用平均value

  item size weight value
1    A    2      3     5
3    B    1      2     3
4    C    3      2     1

I guess I have to use the aggregate function but I could not figure out how exactly I can get the above result.我想我必须使用aggregate函数,但我无法弄清楚如何才能获得上述结果。

aggregate(value ~ item + size + weight, FUN = mean, data=df)

  item size weight value
1    B    1      2     3
2    C    3      2     1
3    A    2      3     5

Here is the solution using the ddply from plyr package:这是使用 plyr 包中的ddply的解决方案:

library(plyr)
ddply(df,.(item),colwise(mean))
  item size weight value
1    A    2      3     5
2    B    1      2     3
3    C    3      2     1
df$value <- ave(df$value,df$item,FUN=mean)
df[!duplicated(df$item),]

  item size weight value
1    A    2      3     5
3    B    1      2     3
4    C    3      2     1

The data.table solution... data.table解决方案...

require(data.table)
DT <- data.table(df)

DT[ , lapply(.SD , mean ) , by = item ]
   item size weight value
1:    A    2      3     5
2:    B    1      2     3
3:    C    3      2     1

Nowadays, this is what I would do:如今,这就是我要做的:

library(dplyr)

df %>%
  group_by(item, size, weight) %>%
  summarize(value = mean(value)) %>%
  ungroup

This yields the following result:这会产生以下结果:

# A tibble: 3 x 4
   item  size weight value
  <chr> <int>  <int> <dbl>
1     A     2      3     5
2     B     1      2     3
3     C     3      2     1

I will leave the accepted answer as such as I specifically asked for aggregate , but I find the dplyr solution the most readable.我将保留已接受的答案,如我特别要求的aggregate ,但我发现dplyr解决方案最具可读性。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM