[英]Creating a new accumulative column in a data frame based on existing columns
我創建了這個主題:不久前根據現有列計算數據框中的新列。 我現在正在尋找類似的東西,但有一點區別。 我,再次,有這個數據集..
df=tibble(article=rep("article one",5),
week=c(1,2,3,4,5),
sales=20,
purchase=c(5,0,5,5,0),
stock=c(50))
# A tibble: 5 x 5
article week sales purchase stock
<chr> <dbl> <dbl> <dbl> <dbl>
1 article one 1 20 5 50
2 article one 2 20 0 50
3 article one 3 20 5 50
4 article one 4 20 5 50
5 article one 5 20 0 50
.. wherenmy 期望的結果如下所示:
# A tibble: 5 x 6
article week sales purchase stock stock_over_time
<chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 article one 1 20 5 50 50
2 article one 2 20 0 50 30
3 article one 3 20 5 50 15
4 article one 4 20 5 50 0
5 article one 5 20 0 50 -5
所以 stock_over_time 計算如下,其中只要stock_over_time
將 go 低於0
,就會減去銷售額,但只是銷售額的比率(此處為銷售額的 25)。
50 - 20 + 0 = 30
30 - 20 + 5 = 15
15 - 20 + 5 = 0
0 - (20 * 1/4) + 0 = -5
我們可以使用帶有lag
的cumsum
library(dplyr)
df %>%
group_by(article) %>%
mutate(stock_over_time = lag(stock + cumsum(lead(purchase) - sales),
default = first(stock)),
stock_over_time = case_when(stock_over_time < 0
~ 0 - (sales * 1/4) + purchase, TRUE ~ stock_over_time)) %>%
ungroup
-輸出
# A tibble: 5 x 6
# article week sales purchase stock stock_over_time
# <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 article one 1 20 5 50 50
#2 article one 2 20 0 50 30
#3 article one 3 20 5 50 15
#4 article one 4 20 5 50 0
#5 article one 5 20 0 50 -5
正如@JonSpring 提到的,它可能是遞歸操作,在這種情況下,我們可以創建一個 function 來執行此操作
f1 <- function(dat) {
dat$stock_over_time <- NA_real_
dat$stock_over_time[1] <- dat$stock[1]
for(i in 2:nrow(dat)) {
dat$stock_over_time[i] <- dat$stock_over_time[i-1] -
dat$sales[i] + dat$purchase[i]
if(dat$stock_over_time[i] < 0 ) {
dat$stock_over_time[i] <- dat$stock_over_time[i-1] -
(dat$sales[i]* 1/4) + dat$purchase[i]
}
}
return(dat)
}
unsplit(lapply(split(df, df$article), f1), df$article)
# A tibble: 5 x 6
# article week sales purchase stock stock_over_time
# <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 article one 1 20 5 50 50
#2 article one 2 20 0 50 30
#3 article one 3 20 5 50 15
#4 article one 4 20 5 50 0
#5 article one 5 20 0 50 -5
或者可以使用從purrr
中accumulate
library(purrr)
f1 <- function(x, y, z) {
tmp <- x - y + z
if(tmp < 0) {
tmp <- x - (y* 1/4) + z
}
return(tmp)
}
}
df %>%
group_by(article) %>%
mutate(stock_over_time = accumulate2(sales,
lead(purchase, default = last(purchase)), f1, .init = first(stock)) %>%
flatten_dbl() %>%
head(-1)) %>%
ungroup
# A tibble: 5 x 6
# article week sales purchase stock stock_over_time
# <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 article one 1 20 5 50 50
#2 article one 2 20 0 50 30
#3 article one 3 20 5 50 15
#4 article one 4 20 5 50 0
#5 article one 5 20 0 50 -5
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.