简体   繁体   English

R中data.table列表中的唯一值

[英]Unique values in data.table list in R

I am working with a rather large data set (8M obs) and I am trying to obtain unique values/balances for a given date AND not in the previous date/date range. 我正在使用一个相当大的数据集(800万个观察值),并且正在尝试获取给定日期而不是先前日期/日期范围内的唯一值/余额。 My date range covers 156 monthly observations. 我的日期范围包括156个月的观察。 Here is an example of what and how I am proceeding but I'm sure there is a more efficient way. 这是我进行操作的方式和方式的示例,但是我敢肯定有一种更有效的方法。

library(data.table)
df = data.frame(ID = c("1234", "5678", "1234", "1112"))
set.seed(1234)
df$Bal = cbind(rnorm(4))
df$Date = as.Date(c("2017-12-31",rep("2018-01-31",3)))
setDT(df)
df[,.(count = uniqueN(ID)), by = Date]
tmp = split(df[,.SD,.SDcols = 1:3], by = "Date")
table(tmp[[2]][,ID] %in% tmp[[1]][,ID])
# FALSE  TRUE 
#   2     1

Essentially, 2 FALSE would represent new IDs and 1 TRUE would represent existing. 本质上,2 FALSE将代表新的ID,而1 TRUE将代表现有的ID。 Additionally, I would like the sum of the balance information. 另外,我想要余额信息的总和。 For example 例如

Sum of old balance: -0.1226248
Sum of new balance: -2.068269

In turn, my new data frame would be 反过来,我的新数据框将是

            New_Balance Old_Balance New_Accts Exisiting_Accts
2018-01-31  -2.068269   -0.1226248      2           1

Any help would be greatly appreciated! 任何帮助将不胜感激!

The code below reproduces the expected result for the given sample dataset of two months: 下面的代码再现了两个月给定样本数据集的预期结果:

library(data.table)
# make sure data is ordered by Date
setorder(df, Date)
# mark first appearance of each ID
df[, Acct := "Old"][rowid(ID) == 1L, Acct := "New"][]
# cumulative balances for each ID
df[, Balance := cumsum(Bal), by = ID][]
# reshape from long to wide format
dcast(df, Date ~ Acct, fun = list(sum, length), value.var = "Balance")
  Date Balance_sum_New Balance_sum_Old Balance_length_New Balance_length_Old 1: 2017-12-31 -1.207066 0.0000000 1 0 2: 2018-01-31 -2.068268 -0.1226246 2 1 

Note the code needs to be verified against a more elaborate sample dataset containing more months. 请注意,该代码需要针对包含更多月份的更详细的示例数据集进行验证。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM