簡體   English   中英

R 和 data.table 中的一列條件滯后值

[英]A column of conditional lagged values in R with data.table

我有一個看起來像這樣的data.table

df
     seller_id buyer_id       hs10 year mean_ln_c_hit
  1:         1        5 1200000000    5      4.407456
  2:         1       24 1500000000    2      4.173422
  3:         1        8 1500000000    3      4.532695
  4:         1        8 1500000000    5      4.830106
  5:         1       16 1500000000    5      4.830106
  6:         1       17 1800000000    3      3.920435
  7:         1       20 1800000000    5      4.357196
  8:         1       27 1800000000    5      4.357196
  9:         1        1 1900000000    1      4.835762
 10:         1        2 1900000000    1      4.835762
 11:         1        3 1900000000    1      4.835762
 12:         1        4 1900000000    1      4.835762
 13:         1        5 1900000000    1      4.835762
 14:         1        6 1900000000    1      4.835762
 15:         1        7 1900000000    1      4.835762
 16:         1        8 1900000000    1      4.835762
 17:         1        9 1900000000    1      4.835762
 18:         1       11 1900000000    1      4.835762
 19:         1       12 1900000000    1      4.835762
 20:         1       13 1900000000    1      4.835762
 21:         1       14 1900000000    1      4.835762
 22:         1       15 1900000000    1      4.835762
 23:         1       16 1900000000    1      4.835762
 24:         1       17 1900000000    1      4.835762
 25:         1       18 1900000000    1      4.835762
 26:         1       19 1900000000    1      4.835762
 27:         1       20 1900000000    1      4.835762
 28:         1       21 1900000000    1      4.835762
 29:         1       22 1900000000    1      4.835762
 30:         1       23 1900000000    1      4.835762
 31:         1       24 1900000000    1      4.835762
 32:         1       25 1900000000    1      4.835762
 33:         1       26 1900000000    1      4.835762
 34:         1       27 1900000000    1      4.835762
 35:         1       28 1900000000    1      4.835762
 36:         1       29 1900000000    1      4.835762
 37:         1       30 1900000000    1      4.835762
 38:         1        1 1900000000    2      4.409253
 39:         1        2 1900000000    2      4.409253
 40:         1        3 1900000000    2      4.409253
 41:         1        4 1900000000    2      4.409253
 42:         1        5 1900000000    2      4.409253
 43:         1        6 1900000000    2      4.409253
 44:         1        7 1900000000    2      4.409253
 45:         1        8 1900000000    2      4.409253
 46:         1        9 1900000000    2      4.409253
 47:         1       10 1900000000    2      4.409253
 48:         1       11 1900000000    2      4.409253
 49:         1       12 1900000000    2      4.409253
 50:         1       13 1900000000    2      4.409253
 51:         1       14 1900000000    2      4.409253
 52:         1       15 1900000000    2      4.409253
 53:         1       16 1900000000    2      4.409253
 54:         1       17 1900000000    2      4.409253
 55:         1       18 1900000000    2      4.409253
 56:         1       19 1900000000    2      4.409253
 57:         1       20 1900000000    2      4.409253
 58:         1       21 1900000000    2      4.409253
 59:         1       22 1900000000    2      4.409253
 60:         1       23 1900000000    2      4.409253
 61:         1       25 1900000000    2      4.409253
 62:         1       26 1900000000    2      4.409253
 63:         1       27 1900000000    2      4.409253
 64:         1       28 1900000000    2      4.409253
 65:         1       29 1900000000    2      4.409253
 66:         1       30 1900000000    2      4.409253
 67:         1        1 1900000000    3      4.514642
 68:         1        3 1900000000    3      4.514642
 69:         1        4 1900000000    3      4.514642
 70:         1        5 1900000000    3      4.514642
 71:         1        6 1900000000    3      4.514642
 72:         1        7 1900000000    3      4.514642
 73:         1        9 1900000000    3      4.514642
 74:         1       11 1900000000    3      4.514642
 75:         1       12 1900000000    3      4.514642
 76:         1       13 1900000000    3      4.514642
 77:         1       14 1900000000    3      4.514642
 78:         1       15 1900000000    3      4.514642
 79:         1       16 1900000000    3      4.514642
 80:         1       18 1900000000    3      4.514642
 81:         1       19 1900000000    3      4.514642
 82:         1       20 1900000000    3      4.514642
 83:         1       21 1900000000    3      4.514642
 84:         1       22 1900000000    3      4.514642
 85:         1       23 1900000000    3      4.514642
 86:         1       24 1900000000    3      4.514642
 87:         1       25 1900000000    3      4.514642
 88:         1       26 1900000000    3      4.514642
 89:         1       27 1900000000    3      4.514642
 90:         1       28 1900000000    3      4.514642
 91:         1       29 1900000000    3      4.514642
 92:         1       30 1900000000    3      4.514642
 93:         1        2 1900000000    5      4.698335
 94:         1        3 1900000000    5      4.698335
 95:         1        4 1900000000    5      4.698335
 96:         1        6 1900000000    5      4.698335
 97:         1        7 1900000000    5      4.698335
 98:         1        9 1900000000    5      4.698335
 99:         1       11 1900000000    5      4.698335
100:         1       12 1900000000    5      4.698335

我想使用data.table功能創建一個名為lag_mean_ln_c_hit的新列,其中包含 mean_ln_c_hit 列中值的一年lag ,以seller_idmean_ln_c_hiths10 我最好的嘗試是:

df[!is.na(year), lag_mean_ln_c_hit:= 
                        (.SD[.(seller_id = seller_id, hs10 = hs10, year = year - 1), mean_ln_c_hit, on = c("seller_id", "hs10", "year"), allow.cartesian = TRUE ]) ]

哪種正確顯示何時顯示NAs (即year = 1是第一年,因此第一year所有觀測的滯后應為 NA),但未能捕捉到正確的滯后。 對於不是NA的任何其他組合,它總是報告4.835762 (盡管一旦報告4.173422 )。

每個seller_id有多個buyer_id條目,具有相同的mean_ln_c_hit值。 因此我有重復。 運行代碼,我也收到以下Warning as well

Warning message:
In `[.data.table`(df, !is.na(year), `:=`(lag_mean_ln_c_hit, (.SD[.(seller_id = seller_id,  :
  Supplied 1640 items to be assigned to 100 items of column 'lag_mean_ln_c_hit' (1540 unused)

如果沒有allow.cartesian = TRUE ,代碼將無法運行。

關於如何解決這個問題的任何想法? 它應該很簡單,但我無法理解它。

不清楚所需的 output 是什么。 如果必須恰好在一年前,這里有一個選項:

seller <- unique(df, by=c("seller_id", "hs10", "year"))
df[, oneyearago := year - 1L]
df[, lag_mean_ln_c_hit := seller[.SD, on=.(seller_id, hs10, year=oneyearago), x.mean_ln_c_hit]]

output:

     seller_id buyer_id       hs10 year mean_ln_c_hit oneyearago lag_mean_ln_c_hit
  1:         1        5 1200000000    5      4.407456          4                NA
  2:         1       24 1500000000    2      4.173422          1                NA
  3:         1        8 1500000000    3      4.532695          2          4.173422
  4:         1        8 1500000000    5      4.830106          4                NA
  5:         1       16 1500000000    5      4.830106          4                NA
  6:         1       17 1800000000    3      3.920435          2                NA
  7:         1       20 1800000000    5      4.357196          4                NA
  8:         1       27 1800000000    5      4.357196          4                NA
  9:         1        1 1900000000    1      4.835762          0                NA
 10:         1        2 1900000000    1      4.835762          0                NA
 11:         1        3 1900000000    1      4.835762          0                NA
 12:         1        4 1900000000    1      4.835762          0                NA
 13:         1        5 1900000000    1      4.835762          0                NA
 14:         1        6 1900000000    1      4.835762          0                NA
 15:         1        7 1900000000    1      4.835762          0                NA
 16:         1        8 1900000000    1      4.835762          0                NA
 17:         1        9 1900000000    1      4.835762          0                NA
 18:         1       11 1900000000    1      4.835762          0                NA
 19:         1       12 1900000000    1      4.835762          0                NA
 20:         1       13 1900000000    1      4.835762          0                NA
 21:         1       14 1900000000    1      4.835762          0                NA
 22:         1       15 1900000000    1      4.835762          0                NA
 23:         1       16 1900000000    1      4.835762          0                NA
 24:         1       17 1900000000    1      4.835762          0                NA
 25:         1       18 1900000000    1      4.835762          0                NA
 26:         1       19 1900000000    1      4.835762          0                NA
 27:         1       20 1900000000    1      4.835762          0                NA
 28:         1       21 1900000000    1      4.835762          0                NA
 29:         1       22 1900000000    1      4.835762          0                NA
 30:         1       23 1900000000    1      4.835762          0                NA
 31:         1       24 1900000000    1      4.835762          0                NA
 32:         1       25 1900000000    1      4.835762          0                NA
 33:         1       26 1900000000    1      4.835762          0                NA
 34:         1       27 1900000000    1      4.835762          0                NA
 35:         1       28 1900000000    1      4.835762          0                NA
 36:         1       29 1900000000    1      4.835762          0                NA
 37:         1       30 1900000000    1      4.835762          0                NA
 38:         1        1 1900000000    2      4.409253          1          4.835762
 39:         1        2 1900000000    2      4.409253          1          4.835762
 40:         1        3 1900000000    2      4.409253          1          4.835762
 41:         1        4 1900000000    2      4.409253          1          4.835762
 42:         1        5 1900000000    2      4.409253          1          4.835762
 43:         1        6 1900000000    2      4.409253          1          4.835762
 44:         1        7 1900000000    2      4.409253          1          4.835762
 45:         1        8 1900000000    2      4.409253          1          4.835762
 46:         1        9 1900000000    2      4.409253          1          4.835762
 47:         1       10 1900000000    2      4.409253          1          4.835762
 48:         1       11 1900000000    2      4.409253          1          4.835762
 49:         1       12 1900000000    2      4.409253          1          4.835762
 50:         1       13 1900000000    2      4.409253          1          4.835762
 51:         1       14 1900000000    2      4.409253          1          4.835762
 52:         1       15 1900000000    2      4.409253          1          4.835762
 53:         1       16 1900000000    2      4.409253          1          4.835762
 54:         1       17 1900000000    2      4.409253          1          4.835762
 55:         1       18 1900000000    2      4.409253          1          4.835762
 56:         1       19 1900000000    2      4.409253          1          4.835762
 57:         1       20 1900000000    2      4.409253          1          4.835762
 58:         1       21 1900000000    2      4.409253          1          4.835762
 59:         1       22 1900000000    2      4.409253          1          4.835762
 60:         1       23 1900000000    2      4.409253          1          4.835762
 61:         1       25 1900000000    2      4.409253          1          4.835762
 62:         1       26 1900000000    2      4.409253          1          4.835762
 63:         1       27 1900000000    2      4.409253          1          4.835762
 64:         1       28 1900000000    2      4.409253          1          4.835762
 65:         1       29 1900000000    2      4.409253          1          4.835762
 66:         1       30 1900000000    2      4.409253          1          4.835762
 67:         1        1 1900000000    3      4.514642          2          4.409253
 68:         1        3 1900000000    3      4.514642          2          4.409253
 69:         1        4 1900000000    3      4.514642          2          4.409253
 70:         1        5 1900000000    3      4.514642          2          4.409253
 71:         1        6 1900000000    3      4.514642          2          4.409253
 72:         1        7 1900000000    3      4.514642          2          4.409253
 73:         1        9 1900000000    3      4.514642          2          4.409253
 74:         1       11 1900000000    3      4.514642          2          4.409253
 75:         1       12 1900000000    3      4.514642          2          4.409253
 76:         1       13 1900000000    3      4.514642          2          4.409253
 77:         1       14 1900000000    3      4.514642          2          4.409253
 78:         1       15 1900000000    3      4.514642          2          4.409253
 79:         1       16 1900000000    3      4.514642          2          4.409253
 80:         1       18 1900000000    3      4.514642          2          4.409253
 81:         1       19 1900000000    3      4.514642          2          4.409253
 82:         1       20 1900000000    3      4.514642          2          4.409253
 83:         1       21 1900000000    3      4.514642          2          4.409253
 84:         1       22 1900000000    3      4.514642          2          4.409253
 85:         1       23 1900000000    3      4.514642          2          4.409253
 86:         1       24 1900000000    3      4.514642          2          4.409253
 87:         1       25 1900000000    3      4.514642          2          4.409253
 88:         1       26 1900000000    3      4.514642          2          4.409253
 89:         1       27 1900000000    3      4.514642          2          4.409253
 90:         1       28 1900000000    3      4.514642          2          4.409253
 91:         1       29 1900000000    3      4.514642          2          4.409253
 92:         1       30 1900000000    3      4.514642          2          4.409253
 93:         1        2 1900000000    5      4.698335          4                NA
 94:         1        3 1900000000    5      4.698335          4                NA
 95:         1        4 1900000000    5      4.698335          4                NA
 96:         1        6 1900000000    5      4.698335          4                NA
 97:         1        7 1900000000    5      4.698335          4                NA
 98:         1        9 1900000000    5      4.698335          4                NA
 99:         1       11 1900000000    5      4.698335          4                NA
100:         1       12 1900000000    5      4.698335          4                NA
     seller_id buyer_id       hs10 year mean_ln_c_hit oneyearago lag_mean_ln_c_hit

或者,如果至少是一年前,您可以使用非 equi 連接並選擇最后一場比賽:

seller <- unique(df, by=c("seller_id", "hs10", "year"))
df[, lag_mean_ln_c_hit := seller[.SD, on=.(seller_id, hs10, year<year), x.mean_ln_c_hit, mult="last"]]

數據:

library(data.table)
df <- fread("seller_id buyer_id       hs10 year mean_ln_c_hit
1        5 1200000000    5      4.407456
1       24 1500000000    2      4.173422
1        8 1500000000    3      4.532695
1        8 1500000000    5      4.830106
1       16 1500000000    5      4.830106
1       17 1800000000    3      3.920435
1       20 1800000000    5      4.357196
1       27 1800000000    5      4.357196
1        1 1900000000    1      4.835762
1        2 1900000000    1      4.835762
1        3 1900000000    1      4.835762
1        4 1900000000    1      4.835762
1        5 1900000000    1      4.835762
1        6 1900000000    1      4.835762
1        7 1900000000    1      4.835762
1        8 1900000000    1      4.835762
1        9 1900000000    1      4.835762
1       11 1900000000    1      4.835762
1       12 1900000000    1      4.835762
1       13 1900000000    1      4.835762
1       14 1900000000    1      4.835762
1       15 1900000000    1      4.835762
1       16 1900000000    1      4.835762
1       17 1900000000    1      4.835762
1       18 1900000000    1      4.835762
1       19 1900000000    1      4.835762
1       20 1900000000    1      4.835762
1       21 1900000000    1      4.835762
1       22 1900000000    1      4.835762
1       23 1900000000    1      4.835762
1       24 1900000000    1      4.835762
1       25 1900000000    1      4.835762
1       26 1900000000    1      4.835762
1       27 1900000000    1      4.835762
1       28 1900000000    1      4.835762
1       29 1900000000    1      4.835762
1       30 1900000000    1      4.835762
1        1 1900000000    2      4.409253
1        2 1900000000    2      4.409253
1        3 1900000000    2      4.409253
1        4 1900000000    2      4.409253
1        5 1900000000    2      4.409253
1        6 1900000000    2      4.409253
1        7 1900000000    2      4.409253
1        8 1900000000    2      4.409253
1        9 1900000000    2      4.409253
1       10 1900000000    2      4.409253
1       11 1900000000    2      4.409253
1       12 1900000000    2      4.409253
1       13 1900000000    2      4.409253
1       14 1900000000    2      4.409253
1       15 1900000000    2      4.409253
1       16 1900000000    2      4.409253
1       17 1900000000    2      4.409253
1       18 1900000000    2      4.409253
1       19 1900000000    2      4.409253
1       20 1900000000    2      4.409253
1       21 1900000000    2      4.409253
1       22 1900000000    2      4.409253
1       23 1900000000    2      4.409253
1       25 1900000000    2      4.409253
1       26 1900000000    2      4.409253
1       27 1900000000    2      4.409253
1       28 1900000000    2      4.409253
1       29 1900000000    2      4.409253
1       30 1900000000    2      4.409253
1        1 1900000000    3      4.514642
1        3 1900000000    3      4.514642
1        4 1900000000    3      4.514642
1        5 1900000000    3      4.514642
1        6 1900000000    3      4.514642
1        7 1900000000    3      4.514642
1        9 1900000000    3      4.514642
1       11 1900000000    3      4.514642
1       12 1900000000    3      4.514642
1       13 1900000000    3      4.514642
1       14 1900000000    3      4.514642
1       15 1900000000    3      4.514642
1       16 1900000000    3      4.514642
1       18 1900000000    3      4.514642
1       19 1900000000    3      4.514642
1       20 1900000000    3      4.514642
1       21 1900000000    3      4.514642
1       22 1900000000    3      4.514642
1       23 1900000000    3      4.514642
1       24 1900000000    3      4.514642
1       25 1900000000    3      4.514642
1       26 1900000000    3      4.514642
1       27 1900000000    3      4.514642
1       28 1900000000    3      4.514642
1       29 1900000000    3      4.514642
1       30 1900000000    3      4.514642
1        2 1900000000    5      4.698335
1        3 1900000000    5      4.698335
1        4 1900000000    5      4.698335
1        6 1900000000    5      4.698335
1        7 1900000000    5      4.698335
1        9 1900000000    5      4.698335
1       11 1900000000    5      4.698335
1       12 1900000000    5      4.698335")

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM