简体   繁体   English

R:如何在 data.table 中标记特定时间范围内的观察?

[英]R: How to flag observations within a certain timeframe in data.table?

I'm working with a large data frame similar to the one below.我正在使用类似于下面的大型数据框。 I'd like to flag all observations that have an observation 30 days earlier by ID.我想通过 ID 标记所有在 30 天前有观察结果的观察结果。 I had originally been trying to do a fuzzyjoin to achieve this, but can't seem to nail down where I'm going wrong with {data.table}.我最初一直试图做一个模糊连接来实现这一点,但似乎无法确定我在 {data.table} 上哪里出错了。 Any tips?有小费吗?

library(tidyverse)
library(magrittr)
library(data.table)
df<-tibble(
  date=sample(seq(as.Date('1999/01/01'), as.Date('1999/06/01'), by="day"), 300,replace=T),
  id=sample(seq(1:3),300,replace=T),claim_id=1:300)

df%<>%data.table()
df_index<-df
df_readmit<-df
names(df_index)[c(1,3)]<-c("index_date","index_id")
names(df_readmit)[c(1,3)]<-c("readmit_date","readmit_id")

df_readmit[df_index,.(id,index_date,readmit_date,index_id,readmit_id),
           on=.(id,readmit_date>index_date),nomatch=0]

If order can be changed, then I suggest we just look at the diff of the dates.如果可以更改顺序,那么我建议我们只查看日期的diff

library(data.table)
setorder(df, date)
df[,.SD[c(TRUE, diff(date) > 30),], by = id]
#       id       date claim_id
#    <int>     <Date>    <int>
# 1:     1 1999-01-01      231
# 2:     2 1999-01-02      284
# 3:     3 1999-01-03       78

In this case, because 100 days spanning 6 months is very unlikely to have a 30-day span untouched, none of the sample data has such an occurrence.在这种情况下,因为跨越 6 个月的 100 天不太可能有 30 天的跨度保持不变,所以没有一个样本数据出现这种情况。 However, perhaps the method works for you with your real data.但是,也许该方法适用于您的真实数据。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM