简体   繁体   English

计算过去 6 个月的独特观察次数(针对每次观察)

[英]Count number of unique observations last 6 months (for every observation)

How would you solve the following problem in R / tidyverse?:您将如何解决 R / tidyverse 中的以下问题?:

sample data:样本数据:

tibble(
  date = seq(as.Date(paste0("2010-01-",runif(1,1,25))), by = "month", length.out = 24),
  machine_ID = sample(letters[1:10],size = 24,replace = T),
  machine_cat = rep(c(1,2),12)
)

objective:客观的:

Add a column called last6m , which counts the number of unique machine_ID s observed in the last 6 months, within the associated machine_cat .添加一个名为last6m的列,该列计算在关联的machine_cat中最近 6 个月内观察到的唯一machine_ID的数量。

tidyverse and no looping is prefered (purrr is ok). tidyverse 并且没有循环是首选(purrr 是可以的)。

Appriciate if anyone would take a quick look: Thanks in advance :-) Appriciate 如果有人会快速浏览一下:提前致谢 :-)

The following solution was obtained based on the post suggested by MrFlick and r2evans: G. Grothendieck's answer以下解决方案是根据 MrFlick 和 r2evans 所建议的帖子获得的: G. Grothendieck's answer

library(tidyverse)
library(lubridate)
library(sqldf)

data <- tibble(
  date = seq(as.Date(paste0("2010-01-",runif(1,1,25))), by = "month", length.out = 24),
  machine_ID = sample(letters[1:10],size = 24,replace = T),
  machine_cat = rep(c(1,2),12)
)

sqldf("
  SELECT a.*, COUNT(distinct(b.machine_ID)) AS last6m
  FROM data a
  LEFT JOIN data b
  ON a.machine_cat = b.machine_cat
  AND (b.date between a.date - 180 AND a.date)
  GROUP BY a.rowid
") %>% arrange(machine_cat,date)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM