簡體   English   中英

R中數據框中兩列之間的計算

[英]Calculations between two columns in a data frame in R

我想基於其他兩個列創建一個新的expected列。 通過添加列const中的值並減去列value的值來創建新列。

我的數據:

df<-data.frame(product = rep(c('A','B'),each=4), data = seq(as.Date("2020-01-01"), as.Date("2020-01-04"), by = "day"),
               value = c(10, 15, 0, 5, 20, 5, 10, 0), const = c(100, 0, 10, 0, 100, 0, 0, 10), 
               expected = c(90, 75, 85, 80, 80, 75, 65, 75))

> df
  product       data value const expected
1       A 2020-01-01    10   100       90
2       A 2020-01-02    15     0       75
3       A 2020-01-03     0    10       85
4       A 2020-01-04     5     0       80
5       B 2020-01-01    20   100       80
6       B 2020-01-02     5     0       75
7       B 2020-01-03    10     0       65
8       B 2020-01-04     0    10       75

編輯數據:

TD<-data.frame(product = rep("A",4), data = seq(as.Date("2020-01-01"), as.Date("2020-01-04"), by = "day"),
               value = c(15, 1, 2, 1, 0), value2 = c(10, 0, 10, 0, 100))

TD <- TD %>% group_by(product) %>%  mutate(expected1 = cumsum(value2) - cumsum(value))

TD
  product data       value value2 expected1
  <fct>   <date>     <dbl>  <dbl>     <dbl>
1 A       2020-01-01    15     10        -5
2 A       2020-01-02     1      0        -6
3 A       2020-01-03     2     10         2
4 A       2020-01-04     1      0         1
5 A       2020-01-05     0    100       101

TD_expected
 product       data value value2 expected1
1       A 2020-01-01    15     10        -5
2       A 2020-01-02     1      0        -6
3       A 2020-01-03     2     10         8
4       A 2020-01-04     1      0         7
5       A 2020-01-05     0    100       107

注意:當 value2 大於 value1 時,我們將 value2 分配給預期的

您可以使用avecumsum

df$expected <- ave(df$const - df$value, df$product, FUN=cumsum)
df
#  product       data value const expected
#1       A 2020-01-01    10   100       90
#2       A 2020-01-02    15     0       75
#3       A 2020-01-03     0    10       85
#4       A 2020-01-04     5     0       80
#5       B 2020-01-01    20   100       80
#6       B 2020-01-02     5     0       75
#7       B 2020-01-03    10     0       65
#8       B 2020-01-04     0    10       75

您可以按組取constvaluecumsum ,然后減去

library(dplyr)
df %>% group_by(product) %>%  mutate(expected1 = cumsum(const) - cumsum(value))

#  product data       value const expected expected1
#  <fct>   <date>     <dbl> <dbl>    <dbl>     <dbl>
#1 A       2020-01-01    10   100       90        90
#2 A       2020-01-02    15     0       75        75
#3 A       2020-01-03     0    10       85        85
#4 A       2020-01-04     5     0       80        80
#5 B       2020-01-01    20   100       80        80
#6 B       2020-01-02     5     0       75        75
#7 B       2020-01-03    10     0       65        65
#8 B       2020-01-04     0    10       75        75

使用可以通過以下方式完成的基礎 R

df$expected1 <- with(df, ave(const, product, FUN = cumsum) - 
                         ave(value, product, FUN = cumsum))

和數據data.table

library(data.table)
setDT(df)[, expected1 := cumsum(const) - cumsum(value), product]

編輯

對於更新,我們可以創建一個新組並遵循相同的過程。

TD %>% 
  group_by(product, group = cumsum(value2 > value)) %>%
  mutate(expected1 = cumsum(value2) - cumsum(value)) %>%
  ungroup() %>%
  select(-group)

# product data       value value2 expected1
#  <fct>   <date>     <dbl>  <dbl>     <dbl>
#1 A       2020-01-01    15     10        -5
#2 A       2020-01-02     1      0        -6
#3 A       2020-01-03     2     10         8
#4 A       2020-01-04     1      0         7

我們也可以在tidyverse使用類似於@GKi 帖子中的ave選項的單個cumsum來執行此操作

library(dplyr)
df %>% 
   group_by(product) %>%
   mutate(expected1 = cumsum(const - value))

這是一個base R的解決方案,其中應用ave()cumsum()來獲得expected

  • 對於原始數據df
dfs <- split(df,df$product)
df <- Reduce(rbind,lapply(dfs, function(x) {
  within(x, expected <- ave(const-value,
                             ave(const-value,
                                 cumsum(const>value),FUN = cumsum)>0,FUN = cumsum))
}))

以至於

> df
  product       data value const expected
1       A 2020-01-01    10   100       90
2       A 2020-01-02    15     0       75
3       A 2020-01-03     0    10       85
4       A 2020-01-04     5     0       80
5       B 2020-01-01    20   100       80
6       B 2020-01-02     5     0       75
7       B 2020-01-03    10     0       65
8       B 2020-01-04     0    10       75
  • 對於已編輯的數據DT :您可以使用
TDs <- split(TD,TD$product)
TD <- Reduce(rbind,lapply(dfs, function(x) {
  within(x, expected <- ave(value2-value,
                             ave(value2-value,
                                 cumsum(value2>value),FUN = cumsum)>0,FUN = cumsum))
}))

以至於

> TD
  product       data value value2 expected
1       A 2020-01-01    15     10       -5
2       A 2020-01-02     1      0       -6
3       A 2020-01-03     2     10        8
4       A 2020-01-04     1      0        7
5       A 2020-01-05     0    100      107

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM