R- Collapse rows and sum the values in the column

Question

I have the following dataframe (df1):

 ID someText PSM OtherValues ABC c 2 qwe CCC v 3 wer DDD b 56 ert EEE m 78 yu FFF sw 1 io GGG e 90 gv CCC r 34 scf CCC t 21 fvb KOO y 45 hffd EEE u 2 asd LLL i 4 dlm ZZZ i 8 zzas

I would like to collapse the first column and add the corresponding PSM values and I would like to get the following output:

ID  Sum PSM
ABC 2
CCC 58
DDD 56
EEE 80
FFF 1
GGG 90
KOO 45
LLL 4
ZZZ 8

It seems doable with aggregate function but don't know the syntax. Any help is really appreciated! Thanks.

Answer 1

In base:

aggregate(PSM ~ ID, data=x, FUN=sum)
##    ID PSM
## 1 ABC   2
## 2 CCC  58
## 3 DDD  56
## 4 EEE  80
## 5 FFF   1
## 6 GGG  90
## 7 KOO  45
## 8 LLL   4
## 9 ZZZ   8

Answer 2

Example using dplyr, the next iteration of plyr:

df2 <- df1 %>% group_by(ID) %>%
     summarize(Sum_PSM = sum(PSM))

When you put the characters %>% , you are "piping." This means you're inputting what is on the left side of that pipe operator and performing the function on the right.

Answer 3

This is super easy using the plyr package:

library(plyr)
ddply(df1, .(ID), summarize, Sum=sum(PSM))

Answer 4

Using aggregate function seems to be better than dplyr if you want to just keep the original column names and operate inside one column at a time. Avoiding the use of summarize function,

Note from summarize function documentation

Be careful when using existing variable names; the corresponding columns will be immediately updated with the new data and this can affect subsequent operations referring to those variables.

For instance

## modified example from aggregate documentation with character variables and NAs
testDF <- data.frame(v1 = c(1,3,5,7,8,3,5,NA,4,5,7,9),
                 v2 = c(11,33,55,77,88,33,55,NA,44,55,77,99) )
by <- c("red", "blue", 1, 2, NA, "big", 1, 2, "red", 1, NA, 12)

aggregate(x = testDF, by = list(by1), FUN = "sum")
Group.1 v1  v2
1       1 15 165
2      12  9  99
3       2 NA  NA
4     big  3  33
5    blue  3  33
6     red  5  55

You get what you want, but when you use summarise and ddply you need to specify names. So if you have many columns aggregate seems to be convenient.

testDF$ID=by1
ddply(testDF, .(ID), summarize, v1=sum(v1), v2=sum(v2) )
ID v1  v2
1    1 15 165
2   12  9  99
3    2 NA  NA
4  big  3  33
5 blue  3  33
6  red  5  55
7 <NA> 15 165

To see the effect of the immediate update of the columns with summarize you can check the following examples,

ddply(testDF, .(ID), summarize, v1=max(v1,v2), v2=min(v1,v2) )
ID v1 v2
1    1 55 55
2   12 99 99
3    2 NA NA
4  big 33 33
5 blue 33 33
6  red 44 11
7 <NA> 88 77

ddply(testDF, .(ID), summarize, v1=min(v1,v2), v2=min(v1,v2) )
ID v1 v2
1    1  5  5
2   12  9  9
3    2 NA NA
4  big  3  3
5 blue  3  3
6  red  1  1
7 <NA>  7  7

Note that when V1 uses max, the col is already update when calculating v2, so for instance in the case of ID=1 we can't get the number 5 when using min in v2.

Answer 5

使用data.table

setDT(df1)[,  lapply(.SD, sum) , by = ID, .SDcols = "PSM" ]

R- Collapse rows and sum the values in the column

Question

5 answers

solution1
19 ACCPTED 2013-05-27 17:36:06

solution2
3 2016-04-09 14:51:34

solution3
2 2013-05-27 17:34:44

solution4
0 2018-10-03 15:10:31

solution5
0 2019-04-03 15:04:10

R- Collapse rows and sum the values in the column

Question

5 answers

solution1 19 ACCPTED 2013-05-27 17:36:06

solution2 3 2016-04-09 14:51:34

solution3 2 2013-05-27 17:34:44

solution4 0 2018-10-03 15:10:31

solution5 0 2019-04-03 15:04:10

solution1
19 ACCPTED 2013-05-27 17:36:06

solution2
3 2016-04-09 14:51:34

solution3
2 2013-05-27 17:34:44

solution4
0 2018-10-03 15:10:31

solution5
0 2019-04-03 15:04:10