简体   繁体   English

R:按列组汇总数据-使用每个观察值对列进行变异

[英]R: Aggregating data by column group - mutate column with values for each observation

I'm having a beginner's issue aggregating the data for a category of data, creating a new column with the sum of each category's data for each observance. 我遇到一个初学者的问题,即汇总一个类别的数据的数据,创建一个新列,其中包含每个遵守情况下每个类别的数据的总和。

I'd like the following data: 我想要以下数据:

PIN Balance
221 5000
221 2000
221 1000
554 4000
554 4500
643 6000
643 4000

To look like: 看起来像:

PIN Balance Total
221 5000 8000
221 2000 8000
221 1000 8000
554 4000 8500
554 4500 8500
643 6000 10000
643 4000 10000

I've tried using aggregate: output <- aggregate(df$Balance ~ df$PIN, data = df, sum) but haven't been able to get the data back into my original dataset as the number of obsverations were off. 我试过使用聚合:输出<-聚合(df $ Balance〜df $ PIN,数据= df,总和),但是由于关闭的次数太少,因此无法将数据返回到我的原始数据集中。

You can use dplyr to do what you want. 您可以使用dplyr执行您想要的操作。 We first group_by PIN and then create a new column Total using mutate that is the sum of the grouped Balance : 我们首先group_by PIN ,然后创建一个新列Total使用mutate是分组的总和Balance

library(dplyr)
res <- df %>% group_by(PIN) %>% mutate(Total=sum(Balance))

Using your data as a data frame df : 将数据用作数据框df

df <- structure(list(PIN = c(221L, 221L, 221L, 554L, 554L, 643L, 643L
), Balance = c(5000L, 2000L, 1000L, 4000L, 4500L, 6000L, 4000L
)), .Names = c("PIN", "Balance"), class = "data.frame", row.names = c(NA, 
-7L))
##  PIN Balance
##1 221    5000
##2 221    2000
##3 221    1000
##4 554    4000
##5 554    4500
##6 643    6000
##7 643    4000

We get the expected result: 我们得到了预期的结果:

print(res)
##Source: local data frame [7 x 3]
##Groups: PIN [3]
##
##    PIN Balance Total
##  <int>   <int> <int>
##1   221    5000  8000
##2   221    2000  8000
##3   221    1000  8000
##4   554    4000  8500
##5   554    4500  8500
##6   643    6000 10000
##7   643    4000 10000

Or we can use data.table : 或者我们可以使用data.table

library(data.table)
setDT(df)[,Table:=sum(Balance),by=PIN][]
##    PIN Balance Total
##1:  221    5000  8000
##2:  221    2000  8000
##3:  221    1000  8000
##4:  554    4000  8500
##5:  554    4500  8500
##6:  643    6000 10000
##7:  643    4000 10000

Consider a base R solution with a sapply() conditional sum approach: 考虑使用sapply()条件和方法的基本R解决方案:

df <- read.table(text="PIN Balance
                 221 5000
                 221 2000
                 221 1000
                 554 4000
                 554 4500
                 643 6000
                 643 4000", header=TRUE)    

df$Total <- sapply(seq(nrow(df)), function(i){
  sum(df[df$PIN == df$PIN[i], c("Balance")])
}) 

#   PIN Balance Total
# 1 221    5000  8000
# 2 221    2000  8000
# 3 221    1000  8000
# 4 554    4000  8500
# 5 554    4500  8500
# 6 643    6000 10000
# 7 643    4000 10000

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM