根據現有列在數據框中創建新的累積列

Question

我創建了這個主題：不久前根據現有列計算數據框中的新列。 我現在正在尋找類似的東西，但有一點區別。 我，再次，有這個數據集..

df=tibble(article=rep("article one",5), 
week=c(1,2,3,4,5), 
sales=20, 
purchase=c(5,0,5,5,0), 
stock=c(50))

# A tibble: 5 x 5
  article      week sales purchase stock
  <chr>       <dbl> <dbl>    <dbl> <dbl>
1 article one     1    20        5    50
2 article one     2    20        0    50
3 article one     3    20        5    50
4 article one     4    20        5    50
5 article one     5    20        0    50

.. wherenmy 期望的結果如下所示：

# A tibble: 5 x 6
  article      week sales purchase stock stock_over_time
  <chr>       <dbl> <dbl>    <dbl> <dbl>           <dbl>
1 article one     1    20        5    50              50
2 article one     2    20        0    50              30
3 article one     3    20        5    50              15
4 article one     4    20        5    50               0
5 article one     5    20        0    50              -5

所以 stock_over_time 計算如下，其中只要stock_over_time將 go 低於0 ，就會減去銷售額，但只是銷售額的比率（此處為銷售額的 25）。

50 - 20 + 0 = 30
30 - 20 + 5 = 15
15 - 20 + 5 = 0
0 - (20 * 1/4) + 0 = -5

Answer 1

我們可以使用帶有lag的cumsum

library(dplyr)
df %>%
  group_by(article) %>%
  mutate(stock_over_time =  lag(stock + cumsum(lead(purchase) - sales),
       default = first(stock)),
     stock_over_time = case_when(stock_over_time < 0 
           ~ 0 - (sales * 1/4) + purchase, TRUE ~ stock_over_time)) %>%
  ungroup

-輸出

# A tibble: 5 x 6
#  article      week sales purchase stock stock_over_time
#  <chr>       <dbl> <dbl>    <dbl> <dbl>           <dbl>
#1 article one     1    20        5    50              50
#2 article one     2    20        0    50              30
#3 article one     3    20        5    50              15
#4 article one     4    20        5    50               0
#5 article one     5    20        0    50              -5

正如@JonSpring 提到的，它可能是遞歸操作，在這種情況下，我們可以創建一個 function 來執行此操作

f1 <- function(dat) {
   dat$stock_over_time <- NA_real_
   dat$stock_over_time[1] <- dat$stock[1]
   for(i in 2:nrow(dat)) {
         
      dat$stock_over_time[i] <- dat$stock_over_time[i-1] - 
                      dat$sales[i] + dat$purchase[i]
         if(dat$stock_over_time[i] < 0 ) {
             dat$stock_over_time[i] <- dat$stock_over_time[i-1] - 
                 (dat$sales[i]* 1/4) + dat$purchase[i]
          }  
   }
   return(dat)
 }


unsplit(lapply(split(df, df$article), f1), df$article)
# A tibble: 5 x 6
#  article      week sales purchase stock stock_over_time
#  <chr>       <dbl> <dbl>    <dbl> <dbl>           <dbl>
#1 article one     1    20        5    50              50
#2 article one     2    20        0    50              30
#3 article one     3    20        5    50              15
#4 article one     4    20        5    50               0
#5 article one     5    20        0    50              -5

或者可以使用從purrr中accumulate

library(purrr)
f1 <- function(x, y, z) {
         tmp <- x - y + z
         if(tmp < 0) {
            tmp <- x - (y* 1/4) + z
            }
         
         return(tmp)
         }

}


df %>% 
  group_by(article) %>%
  mutate(stock_over_time = accumulate2(sales, 
     lead(purchase, default = last(purchase)), f1, .init = first(stock)) %>%
                    flatten_dbl() %>%
                    head(-1)) %>%
  ungroup
# A tibble: 5 x 6
#  article      week sales purchase stock stock_over_time
#  <chr>       <dbl> <dbl>    <dbl> <dbl>           <dbl>
#1 article one     1    20        5    50              50
#2 article one     2    20        0    50              30
#3 article one     3    20        5    50              15
#4 article one     4    20        5    50               0
#5 article one     5    20        0    50              -5

根據現有列在數據框中創建新的累積列

問題描述

1 個解決方案

解決方案1
3 已采納 2021-03-06 19:31:22

根據現有列在數據框中創建新的累積列

問題描述

1 個解決方案

解決方案1 3 已采納 2021-03-06 19:31:22

解決方案1
3 已采納 2021-03-06 19:31:22