I am writing a script to process data and need one of a pair of rows removed from a data set. In the example below I want to keep the first dilution (which will always be smaller than the second) if it is below 20,000 but select the 2nd dilution if the first is over 20,000 no matter what the second dilution is. The exact dilution values will vary from dataset to dataset but it will never be more than two dilutions for each patient so I will always want to check the lowest dilution first against the threshold of 20,000 which will remain the same. Also this data set contains a lot of columns containing meta data.
Patient Dilution Value
John 2 30000
John 20 15000
George 2 13000
George 20 700
Kelly 2 49000
Kelly 20 24000
Tom 2 80000
Tom 20 30000
Diane 2 700
Diane 20 0
Patient Dilution Value
John 20 15000
George 2 13000
Kelly 20 24000
Tom 20 30000
Diane 2 700
If you would like to look at the rest of my code here it is (yes I am a noob).
###SA Summary
sadf <- merge(mydata, elisadata, "Description", all.x = TRUE)
sadf <- sadf[grep("X", sadf$Type),]
sadf <- sadf[-grep("Blank", sadf$Name),]
sadf <- sadf[-grep("MulV", sadf$Name),]
sadf <- sadf[,c("Isotype","Name","Description","Dilution.x","FI-Bkgd-Neg","Error","Conc..ug.ml.")]
sadf$Error <- as.character(sadf$Error)
sadf$Error[sadf$Conc..ug.ml. < 0.05] <- "LC"
sadf$Conc..ug.ml. <- ifelse(!is.na(sadf$Conc..ug.ml.) & sadf$Conc..ug.ml. < 0.05, NA, sadf$Conc..ug.ml.)
sadf$SA <- with(sadf, sadf$`FI-Bkgd-Neg` * sadf$Dilution.x / sadf$Conc..ug.ml.)
sadf$SA[sadf$SA < 0.02] <- 0.02
if (unique(sadf$Dilution) > 1) {} ###Where I need to put the answer to the question
sadf$`FI-Bkgd-Neg` <- NULL
sadf$Error[is.na(sadf$Error)] <- 0
sadf$Conc..ug.ml.[is.na(sadf$Conc..ug.ml.)] <- 0
sadf <- reshape(sadf, idvar = c("Description","Dilution.x","Isotype","Error","Conc..ug.ml."), timevar = "Name", direction = "wide")
sadf$Error[sadf$Error = 0] <- NA
sadf$Conc..ug.ml.[sadf$Conc..ug.ml. = 0] <- NA
With dplyr
, group_by
patient, and then filter
to the rows (for the grouped-by patient) that satisfy the condition. The condition returns the last
Value
if the first
is over 20000, else the min
imum.
library(dplyr)
df %>% group_by(Patient) %>% filter(Value == ifelse(first(Value) > 20000,
last(Value),
min(Value)))
# Source: local data frame [5 x 3]
# Groups: Patient [5]
#
# Patient Dilution Value
# (fctr) (int) (int)
# 1 John 20 15000
# 2 George 20 700
# 3 Kelly 20 24000
# 4 Tom 20 30000
# 5 Diane 20 0
Note: this approach follows the wording of the question, which would not return the resulting data.frame in the question. If the condition is supposed to return the first dilution if it is under 20000, all you need to do is change min
to first
, and you get the result data frame from the question:
df %>% group_by(Patient) %>% filter(Value == ifelse(first(Value) > 20000,
last(Value),
first(Value)))
# Source: local data frame [5 x 3]
# Groups: Patient [5]
#
# Patient Dilution Value
# (fctr) (int) (int)
# 1 John 20 15000
# 2 George 2 13000
# 3 Kelly 20 24000
# 4 Tom 20 30000
# 5 Diane 2 700
We can use data.table
. Convert the 'data.frame' to 'data.table' ( setDT(df)
), grouped by 'Patient', we use the if/else
condition to subset the rows with the min
'Value' if present of else get the last
one.
setDT(df1)[df1[ , .I[if(min(Value) <20000)
which.min(Value) else .N] , Patient]$V1]
# Patient Dilution Value
#1: John 20 15000
#2: George 20 700
#3: Kelly 20 24000
#4: Tom 20 30000
#5: Diane 20 0
If the condition is based on the first
"Value", we need to make changes from min(Value)
to first(Value)
or Value[1L]
and also use 1 instead of which.min
setDT(df1)[df1[ , .I[if(Value[1L] <20000)
1 else .N], Patient]$V1]
# Patient Dilution Value
#1: John 20 15000
#2: George 2 13000
#3: Kelly 20 24000
#4: Tom 20 30000
#5: Diane 2 700
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.