This is my data frame:
X Y Date Qty CumSumA CumSumB
1 A B 1/1 1 1 0
2 A A 1/1 2 3 2
3 A E 1/1 2 5 2
4 B A 1/1 1 1 1
5 B B 1/1 3 4 4
6 B C 1/1 2 6 4
7 C D 1/1 2 2 2
8 C E 1/1 4 6 2
9 C A 1/1 1 7 2
10 A C 1/2 2 2 0
11 A D 1/2 3 5 0
12 A E 1/2 2 7 0
13 B A 1/2 5 5 0
14 B B 1/2 1 6 1
15 B C 1/2 2 8 1
16 C D 1/2 2 2 4
17 C E 1/2 1 1 4
18 C A 1/2 3 4 4
I get the CumSumA column with
library(dplyr)
data <- data %>%
group_by(Date,X) %>%
mutate(CumSumA= cumsum(Qty))
How can I get CumSumB column such that it is the cumulative sum of Qty
for all rows above that have (a) the same Date
value and (b) the same row X
value in column Y
.
So for example, row 16 has X
value C and Date
value 1/2. I want to get the cumulative sum of Qty
of all rows with Y
value C and Date
value 1/2. So this would be rows 10 plus 15, so CumSumB is 2 + 2 = 4.
Note there are over 140 unique variables for column X and Y.
This solution is build on data.table
and a join with allow.cartesian=TRUE
require(data.table)
setDT(DT)
Creating a base data.table
whose X
column we gonna use later on.
DT_X <- DT[,.(X,Y, Date, indx = .I)]
setkey(DT_X, Date, X)
Dropping X
and inserting an index in the original DT
DT[,`:=`(X=NULL, indy = .I)]
setkey(DT, Date, Y)
Joining the data if X = Y
(with allow.cartesian=TRUE
). Have a look at DT_join
if you are curious. See Why does X[Y] join of data.tables not allow a full outer join, or a left join? why this is a join
DT_join <- DT_X[DT, allow.cartesian=TRUE]
indy<=indx
is an identifier to only take the sum of "all rows above" as you put it.
DT_join[!is.na(Y), .(CumSumB=sum(Qty * (indy<=indx))), by=.(X,Y,Date)]
Edit (based on aosmith Answer): Instead of by=.(X,Y,Date)
one could also use by=indx
Result:
X Y Date CumSumB
1: A B 1/1 0
2: A A 1/1 2
3: A E 1/1 2
4: B A 1/1 1
5: B B 1/1 4
6: B C 1/1 4
7: C D 1/1 2
8: C E 1/1 2
9: C A 1/1 2
10: A C 1/2 0
11: A D 1/2 0
12: A E 1/2 0
13: B A 1/2 0
14: B B 1/2 1
15: B C 1/2 1
16: C D 1/2 4
17: C E 1/2 4
18: C A 1/2 4
Here is a dplyr -based answer using the same logic as @Floo0. This will tend to get slow as you have a larger number of groups.
First, I added the row numbers as a column to the original dataset. The calculation of CumSumB
will be done for each unique row using this approach.
library(dplyr)
dat = dat %>% mutate(row = row_number())
Then I join the dataset to itself, joining X
to Y
and by Date
. To avoid many duplicate columns with added suffixes, I selected only some of the columns for the x
dataset of the join (ie, first dataset of left_join
).
I kept the variable row
in both datasets on purpose, so I end up with a variable called row.x
that indicates the original row number of each X
value and a variable called row.y
indicating the original row number of each Y
value.
dat %>%
left_join(select(dat, X, Date, Y, row), ., by = c("X" = "Y", "Date" = "Date"))
Once that is done, the dataset just needs to be grouped by row.x
and the sum of Qty
calculated conditional on row.x
being less than or equal to row.y
.
dat %>%
left_join(select(dat, X, Date, Y, row), ., by = c("X" = "Y", "Date" = "Date")) %>%
group_by(row.x) %>%
summarise(CumSumB = sum(Qty[row.y <= row.x]))
Last, this can be joined back to the original dataset. The result still contains a column representing the row number, which could be removed via select(-row)
if needed.
dat %>%
left_join(select(dat, X, Date, Y, row), ., by = c("X" = "Y", "Date" = "Date")) %>%
group_by(row.x) %>%
summarise(CumSumB = sum(Qty[row.y <= row.x])) %>%
left_join(dat, ., by = c("row" = "row.x"))
X Y Date Qty CumSumA CumSumB.x row CumSumB.y
1 A B 1/1 1 1 0 1 0
2 A A 1/1 2 3 2 2 2
3 A E 1/1 2 5 2 3 2
4 B A 1/1 1 1 1 4 1
5 B B 1/1 3 4 4 5 4
6 B C 1/1 2 6 4 6 4
7 C D 1/1 2 2 2 7 2
8 C E 1/1 4 6 2 8 2
9 C A 1/1 1 7 2 9 2
10 A C 1/2 2 2 0 10 0
11 A D 1/2 3 5 0 11 0
12 A E 1/2 2 7 0 12 0
13 B A 1/2 5 5 0 13 0
14 B B 1/2 1 6 1 14 1
15 B C 1/2 2 8 1 15 1
16 C D 1/2 2 2 4 16 4
17 C E 1/2 1 1 4 17 4
18 C A 1/2 3 4 4 18 4
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.