简体   繁体   English

根据间隔内的值和日期创建分数

[英]Create a score based on value and date within interval

I need help with coding a binary column indicating that one measurement is above a threshold and found in a one-year interval of a second measurement.我需要帮助编写一个二进制列,指示一次测量高于阈值,并且在第二次测量的一年间隔内发现。 The dataframe has 4 columns (ID, result type, date and value of result) and could look like this: dataframe 有 4 列(ID、结果类型、日期和结果值),可能如下所示:

   IDrow type       Date value
1    ID1  NP1 2018-06-11   4.5
2    ID1  NP2 2018-08-21    55
3    ID2  NP1 2016-04-23   4.7
4    ID3  NP1 2015-11-24   9.5
5    ID3  NP2 2015-11-24  7390
6    ID3  NP2 2016-11-24   308
7    ID4  NP1 2015-01-01   6.0
8    ID4  NP2 2018-01-01   100
9    ID5  NP1 2013-04-04   4.0
10   ID5  NP2 2013-04-04    13

As listed, there are two types of results.如所列,有两种类型的结果。 Some ID's don't have either two types, which is fine.有些 ID 没有两种类型,这很好。 I need help to code a new column indicating that NP2 occurs within +- one year of measurement of NP1.我需要帮助编写一个新列,以指示 NP2 发生在 NP1 测量的 +- 一年内。 If this is true, and the value for the NP2 is above > 25, then the new column should be 1. If the NP2 result is not available for the ID, the new column should state 0. If the date for NP2 is not found in the NP1 date interval, the column should state 0 (as for ID4), and if the date is found in the interval, but the value for NP2 is < 25, the column should also state 0 (as for ID5).如果这是真的,并且 NP2 的值大于 25,则新列应为 1。如果 NP2 结果不可用于 ID,则新列应为 state 0。如果未找到 NP2 的日期在 NP1 日期间隔中,该列应为 8827437401188 0(对于 ID4),如果在间隔中找到日期,但 NP2 的值 < 25,该列也应为 state 0(对于 ID5)。

My desired output looks like this:我想要的 output 看起来像这样:

   IDrow type       Date value positive
1    ID1  NP1 2018-06-11   4.5        1
2    ID1  NP2 2018-08-21    55        1
3    ID2  NP1 2016-04-23   4.7        0
4    ID3  NP1 2015-11-24   9.5        1
5    ID3  NP2 2015-11-24  7390        1
6    ID3  NP2 2016-11-24   308        1
7    ID4  NP1 2015-01-01   6.0        0
8    ID4  NP2 2018-01-01   100        0
9    ID5  NP1 2013-04-04   4.0        0
10   ID5  NP2 2013-04-04    13        0

I have tried dplyr with mutate and case_when, however my skills with dplyr fall short when working with dates.我已经用 mutate 和 case_when 尝试过 dplyr,但是在处理日期时我的 dplyr 技能不够用。

IDrow <- c("ID1", "ID1", "ID2", "ID3", "ID3", "ID3", "ID4", "ID4", "ID5", "ID5")
type <- c("NP1", "NP2", "NP1", "NP1", "NP2", "NP2", "NP1", "NP2", "NP1", "NP2")
Date <- c("2018-06-11","2018-08-21",
          "2016-04-23", 
          "2015-11-24", "2015-11-24", "2016-11-24",
          "2015-01-01", "2018-01-01",
          "2013-04-04", "2013-04-04")
value <- c("4.5", "55", 
           "4.7",
           "9.5", "7390", "308",
           "6.0", "100",
           "4.0", "13")
df <- as.data.frame(cbind(IDrow, type, Date, value))
df$Date <- ymd(df$Date)
df <- df %>% group_by(IDrow, Date) %>% 
  mutate(positive = case_when
         ("NP2" %in% type & Date %within% Date+-duration(365, units="days") & "NP1" %in% type ~ 1))

Thanks in advance - any help is appreciated.在此先感谢 - 任何帮助表示赞赏。

# Load the packages
library(lubridate)
library(dplyr)

# Load the data
IDrow <- c("ID1", "ID1", "ID2", "ID3", "ID3", "ID3", "ID4", "ID4", "ID5", "ID5")
type <- c("NP1", "NP2", "NP1", "NP1", "NP2", "NP2", "NP1", "NP2", "NP1", "NP2")
Date <- c(
  "2018-06-11", "2018-08-21",
  "2016-04-23",
  "2015-11-24", "2015-11-24", "2016-11-24",
  "2015-01-01", "2018-01-01",
  "2013-04-04", "2013-04-04"
)
value <- c(
  4.5, 55,
  4.7,
  9.5, 7390, 308,
  6.0, 100,
  4.0, 13
)

# Transform the date variable
df <- data.frame(IDrow, type, Date = ymd(Date), value)

# Add a new variable based on your conditions
df %>%
  group_by(IDrow) %>%
  mutate(positive = ifelse(
    length(type) > 1,
    (Date[type == "NP2"] %within% interval(Date[type == "NP1"] - years(1), Date[type == "NP1"] + years(1))) & value[type == "NP2"] > 25,
    0
  ))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM