简体   繁体   中英

Sum a rows within a column for each unique combination r

I'd like to sum a value in a given column for each unique combination of two other columns:

For example I'd like to transform the following dataframe from:

Week  Day  Value
1     1    1
1     2    3
1     3    4
2     1    2
2     2    2
2     3    3

to:

Week  Day  Value Sum
1     1    1     1
1     2    3     4
1     3    4     8
2     1    2     2
2     2    2     4
2     3    3     7

I think a for loop would do what I want - but I am completely lost at this point - any and all help appreciated...

In base R, you can use ave() :

x <- read.table(header=T, text="
Week  Day  Value
1     1    1
1     2    3
1     3    4
2     1    2
2     2    2
2     3    3
")
x$Sum <- ave(x$Value, x$Week, FUN=cumsum)

> x
  Week Day Value Sum
1    1   1     1   1
2    1   2     3   4
3    1   3     4   8
4    2   1     2   2
5    2   2     2   4
6    2   3     3   7

Suggest to try dplyr . Quite a workhorse in data manipulation. From the desired output, you seem to try to get cumulative sum based on Week.

df = read.table(text="Week  Day  Value
1     1    1
1     2    3
1     3    4
2     1    2
2     2    2
2     3    3", header=T)

library(dplyr)
df %>% group_by(Week) %>% mutate(Sum = cumsum(Value))

# you get
Source: local data frame [6 x 4]
Groups: Week

  Week Day Value Sum
1    1   1     1   1
2    1   2     3   4
3    1   3     4   8
4    2   1     2   2
5    2   2     2   4
6    2   3     3   7

Or you could try data.table , another tool which is great for data of larger size. Fast and memory efficient.

setDT(df)[, Sum := cumsum(Value), by = Week][]
   Week Day Value Sum
1:    1   1     1   1
2:    1   2     3   4
3:    1   3     4   8
4:    2   1     2   2
5:    2   2     2   4
6:    2   3     3   7

Actually, for loops are probably a bad way of looking at this - they're not very efficient on data frames. Instead I'd recommend data.table :

#Turn into a data.table.
dt <- data.table(df)

#Sum, for each unique combination
dt <- dt[, j = list(value_sum = sum(Value)), by = c("Week","Day")]

Your actual example seems to just sum for each unique week , in which case, drop "Day" from "by".

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM