[英]nested loop and multiple if statement R
I have a dataset a
which is as follows 我有一个数据集
a
,如下所示
Dictionary ActMin ActMax
3145 5 10
32441 10 19
3245 25 32
416356 37 46
4H22 82 130
%ABC 1 27
I have another dataset b
which is as follows 我有另一个数据集
b
,如下所示
ID Test Obs Year
1 3145-MN 11 1994
2 3145-NY 17 1992
1 416356-FL 57 1995
1 32441-MN 13 1995
2 3145-MN 8 1993
2 3245-NY 27 1983
3 3245-FL 45 2003
2 3145-MN 6 2001
3 %ABC-NY 33 1996
4 4H22-TX 97 1984
What I trying to do is produce an output
like this 我想做的是产生这样的
output
Id Test Obs Results Year Description
1 3145-MN 11 High 1994 Tested 3145 High on 1994, 4163 High on 1995,
2 3145-NY 17 High 1992 Tested 3145 High on 1992
1 416356-FL 57 High 1995
1 32441-MN 13 Normal 1995
2 3145-MN 8 Normal 1993
2 3245-NY 27 Normal 1983
3 3245-FL 45 High 2003 Tested 3245 High on 2003
2 3145-MN 6 Normal 2001
3 %ABC-NY 33 High 1996
4 4H22-TX 27 Normal 1984
The first dataset a
is a dictionary that stores unique test number 3145
, 3244
etc and their Minimum
and Maximum
values 该第一数据集
a
是存储唯一的测试号的字典3145
, 3244
等以及它们的Minimum
和Maximum
的值
The second dataset b
is the actual test results dataset that stores the results of what was actually observed. 第二个数据集
b
是实际测试结果数据集,用于存储实际观察到的结果。 The observed value for a specific test in b
is compared to the minimum and maximum values in dataset a
. 将
b
特定测试的观察值与数据集a
的最小值和最大值进行a
。 If the observed value in b
is greater than the actual min and max in a
then results column should be updated as high
, else Normal
. 如果所观察到的值
b
大于在实际的最小值和最大值大于a
随后导致柱应该被更新为high
,否则Normal
。 The description
column should provide a summary of tests which were listed high for each ID ( 1 summary for each ID). description
列应提供每个ID列出的测试摘要(每个ID的1个摘要)。
Need help with this complex loop and if statements and result aggregation. 需要有关此复杂循环以及if语句和结果聚合的帮助。
A little convoluted, but the result should be similar to what you asked. 有点复杂,但结果应该与你问的相似。 I managed to get the
result
column in base R, but for Description
I had to use data.table
. 我设法在基础R中获取
result
列,但是对于Description
我必须使用data.table
。
b$result<-c("Normal","High")[(b$Obs > a$ActMax[match(substr(b$Test,1,4),as.character(a$Dictionary))])+1]
require(data.table)
setDT(b)
b[,Description:=gsub("(, )+$","",c(paste(ifelse(result=="High",paste("Tested",substring(Test,1,4),result,"on",Year),""),collapse=", "),rep("",.N-1))),by=ID]
By using dplyr one may find the code more readable: 通过使用dplyr,可以发现代码更具可读性:
library(dplyr)
df_result <-
b %>%
## EDIT mutate( Dictionary = as.numeric(substring(Test,1,4)) ) %>%
mutate( Dictionary = as.numeric( gsub("[A-Z,-]+", "", Test )) ) %>%
inner_join(a, by = "Dictionary") %>%
mutate( Results = ifelse( Obs > pmax(ActMin, ActMax), yes = "High", no = "Normal" ))
df_description <-
df_result %>%
filter( Results == "High") %>%
group_by(ID) %>%
summarise(
Results = Results[1],
Dictionary = min(Dictionary),
Description = paste("Tested", Dictionary, "on", Year,collapse = ","))
df_final <-
df_result %>%
left_join( df_description, by = c("ID","Dictionary", "Results")) %>%
select(ID, Test, Obs, Results, Year, Description)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.