简体   繁体   中英

R: Aggregating data by column group - mutate column with values for each observation

I'm having a beginner's issue aggregating the data for a category of data, creating a new column with the sum of each category's data for each observance.

I'd like the following data:

PIN Balance
221 5000
221 2000
221 1000
554 4000
554 4500
643 6000
643 4000

To look like:

PIN Balance Total
221 5000 8000
221 2000 8000
221 1000 8000
554 4000 8500
554 4500 8500
643 6000 10000
643 4000 10000

I've tried using aggregate: output <- aggregate(df$Balance ~ df$PIN, data = df, sum) but haven't been able to get the data back into my original dataset as the number of obsverations were off.

You can use dplyr to do what you want. We first group_by PIN and then create a new column Total using mutate that is the sum of the grouped Balance :

library(dplyr)
res <- df %>% group_by(PIN) %>% mutate(Total=sum(Balance))

Using your data as a data frame df :

df <- structure(list(PIN = c(221L, 221L, 221L, 554L, 554L, 643L, 643L
), Balance = c(5000L, 2000L, 1000L, 4000L, 4500L, 6000L, 4000L
)), .Names = c("PIN", "Balance"), class = "data.frame", row.names = c(NA, 
-7L))
##  PIN Balance
##1 221    5000
##2 221    2000
##3 221    1000
##4 554    4000
##5 554    4500
##6 643    6000
##7 643    4000

We get the expected result:

print(res)
##Source: local data frame [7 x 3]
##Groups: PIN [3]
##
##    PIN Balance Total
##  <int>   <int> <int>
##1   221    5000  8000
##2   221    2000  8000
##3   221    1000  8000
##4   554    4000  8500
##5   554    4500  8500
##6   643    6000 10000
##7   643    4000 10000

Or we can use data.table :

library(data.table)
setDT(df)[,Table:=sum(Balance),by=PIN][]
##    PIN Balance Total
##1:  221    5000  8000
##2:  221    2000  8000
##3:  221    1000  8000
##4:  554    4000  8500
##5:  554    4500  8500
##6:  643    6000 10000
##7:  643    4000 10000

Consider a base R solution with a sapply() conditional sum approach:

df <- read.table(text="PIN Balance
                 221 5000
                 221 2000
                 221 1000
                 554 4000
                 554 4500
                 643 6000
                 643 4000", header=TRUE)    

df$Total <- sapply(seq(nrow(df)), function(i){
  sum(df[df$PIN == df$PIN[i], c("Balance")])
}) 

#   PIN Balance Total
# 1 221    5000  8000
# 2 221    2000  8000
# 3 221    1000  8000
# 4 554    4000  8500
# 5 554    4500  8500
# 6 643    6000 10000
# 7 643    4000 10000

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM