簡體   English   中英

將行分組到r data.table中的當前行

[英]Group rows up to current row in r data.table

我有一個如下所示的數據集:

library(data.table)

set.seed(10)

n_rows <- 50

data <- data.table(id = 1:n_rows,
                   timestamp = Sys.Date() + as.difftime(1:n_rows, units = "days"),
                   subject = sample(letters[1:4], n_rows, replace = T),
                   response = sample(3, n_rows, replace = T)
                   )

head(data, 10)

    id  timestamp subject response
 1:  1 2016-05-17       c        2
 2:  2 2016-05-18       b        3
 3:  3 2016-05-19       b        1
 4:  4 2016-05-20       c        2
 5:  5 2016-05-21       a        1
 6:  6 2016-05-22       a        2
 7:  7 2016-05-23       b        2
 8:  8 2016-05-24       b        2
 9:  9 2016-05-25       c        2
10: 10 2016-05-26       b        2

我需要通過按日期對每個響應的出現進行求和的操作來進行一些分組。

下面的group by生成nth_test列。

new_vars <- data[, .(id, timestamp, nth_test = 1:.N, response), by=.(subject)]

    subject id  timestamp nth_test response
 1:       c  1 2016-05-17        1        2
 2:       c  4 2016-05-20        2        2
 3:       c  9 2016-05-25        3        2
 4:       c 11 2016-05-27        4        1
 5:       c 12 2016-05-28        5        1
 6:       c 14 2016-05-30        6        2
 7:       c 22 2016-06-07        7        2
 8:       c 26 2016-06-11        8        2
 9:       c 31 2016-06-16        9        3
10:       c 36 2016-06-21       10        1

但我不知道如何生成列resp_1,resp_2和resp_3,如下所示。

    subject id  timestamp nth_test response resp_1 resp_2 resp_3
 1:       c  1 2016-05-17        1        2      0      1      0
 2:       c  4 2016-05-20        2        2      0      2      0
 3:       c  9 2016-05-25        3        2      0      3      0
 4:       c 11 2016-05-27        4        1      1      3      0
 5:       c 12 2016-05-28        5        1      2      3      0
 6:       c 14 2016-05-30        6        2      2      4      0
 7:       c 22 2016-06-07        7        2      2      5      0
 8:       c 26 2016-06-11        8        2      2      6      0
 9:       c 31 2016-06-16        9        3      2      6      1
10:       c 36 2016-06-21       10        1      3      6      1

干杯

我們可以嘗試

Un1 <- unique(sort(data$response))
data[, c("nth_test", paste("resp", Un1, sep="_")) := c(list(1:.N),
         lapply(Un1, function(x) cumsum(x==response))) , .(subject)]
data[order(subject, timestamp)][subject=="c"]
#    id  timestamp subject response nth_test resp_1 resp_2 resp_3
# 1:  1 2016-05-17       c        2        1      0      1      0
# 2:  4 2016-05-20       c        2        2      0      2      0
# 3:  9 2016-05-25       c        2        3      0      3      0
# 4: 11 2016-05-27       c        1        4      1      3      0
# 5: 12 2016-05-28       c        1        5      2      3      0
# 6: 14 2016-05-30       c        2        6      2      4      0
# 7: 22 2016-06-07       c        2        7      2      5      0
# 8: 26 2016-06-11       c        2        8      2      6      0
# 9: 31 2016-06-16       c        3        9      2      6      1
#10: 36 2016-06-21       c        1       10      3      6      1
#11: 39 2016-06-24       c        1       11      4      6      1
#12: 40 2016-06-25       c        1       12      5      6      1
#13: 44 2016-06-29       c        2       13      5      7      1

我想看看如果在data.table是長格式的情況下完成cummax / cumsum會是什么樣子(在某些配置中可能更有效):

> data[order(subject, timestamp)
+      ][, rCnt := 1:.N, .(subject, response)
+      ][, responseStr := sprintf('%s_%s', 'resp', response)
+      ][, dcast(.SD, id + timestamp + subject + response ~ responseStr, value.var='rCnt', fill=0)
+      ][, melt(.SD, id.vars=c('id', 'timestamp', 'subject', 'response'))
+      ][order(subject, timestamp)
+      ][, value := cummax(value), .(subject, variable)
+      ][, nth_test := 1:.N, .(subject, variable)
+      ][, dcast(.SD, id + timestamp + subject + response + nth_test ~ variable, value.var='value')
+      ][order(subject, timestamp)
+      ][subject == 'c'
+      ]
    id  timestamp subject response nth_test resp_1 resp_2 resp_3
 1:  1 2016-05-17       c        2        1      0      1      0
 2:  4 2016-05-20       c        2        2      0      2      0
 3:  9 2016-05-25       c        2        3      0      3      0
 4: 11 2016-05-27       c        1        4      1      3      0
 5: 12 2016-05-28       c        1        5      2      3      0
 6: 14 2016-05-30       c        2        6      2      4      0
 7: 22 2016-06-07       c        2        7      2      5      0
 8: 26 2016-06-11       c        2        8      2      6      0
 9: 31 2016-06-16       c        3        9      2      6      1
10: 36 2016-06-21       c        1       10      3      6      1
11: 39 2016-06-24       c        1       11      4      6      1
12: 40 2016-06-25       c        1       12      5      6      1
13: 44 2016-06-29       c        2       13      5      7      1
> 

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM