![](/img/trans.png)
[英]Pairwise combine the rows of a grouped R dataframe and apply functions to each pair
[英]R - Remove one of a pair of rows for each pair in dataframe based on condition
我正在编写脚本来处理数据,并且需要从数据集中删除一对行之一。 在下面的示例中,如果第二稀释液低于20,000,我想保留第一稀释液(它总是小于第二稀释液),但是无论第二稀释液是多少,如果第一稀释液超过20,000,请选择第二稀释液。 确切的稀释度值会因数据集的不同而有所差异,但是对于每个患者,稀释度永远不会超过两个稀释度,因此我将始终要首先检查最低稀释度是否与20,000的阈值保持不变。 此外,此数据集包含许多包含元数据的列。
Patient Dilution Value
John 2 30000
John 20 15000
George 2 13000
George 20 700
Kelly 2 49000
Kelly 20 24000
Tom 2 80000
Tom 20 30000
Diane 2 700
Diane 20 0
Patient Dilution Value
John 20 15000
George 2 13000
Kelly 20 24000
Tom 20 30000
Diane 2 700
如果您想在这里查看其余的代码(是,我是菜鸟)。
###SA Summary
sadf <- merge(mydata, elisadata, "Description", all.x = TRUE)
sadf <- sadf[grep("X", sadf$Type),]
sadf <- sadf[-grep("Blank", sadf$Name),]
sadf <- sadf[-grep("MulV", sadf$Name),]
sadf <- sadf[,c("Isotype","Name","Description","Dilution.x","FI-Bkgd-Neg","Error","Conc..ug.ml.")]
sadf$Error <- as.character(sadf$Error)
sadf$Error[sadf$Conc..ug.ml. < 0.05] <- "LC"
sadf$Conc..ug.ml. <- ifelse(!is.na(sadf$Conc..ug.ml.) & sadf$Conc..ug.ml. < 0.05, NA, sadf$Conc..ug.ml.)
sadf$SA <- with(sadf, sadf$`FI-Bkgd-Neg` * sadf$Dilution.x / sadf$Conc..ug.ml.)
sadf$SA[sadf$SA < 0.02] <- 0.02
if (unique(sadf$Dilution) > 1) {} ###Where I need to put the answer to the question
sadf$`FI-Bkgd-Neg` <- NULL
sadf$Error[is.na(sadf$Error)] <- 0
sadf$Conc..ug.ml.[is.na(sadf$Conc..ug.ml.)] <- 0
sadf <- reshape(sadf, idvar = c("Description","Dilution.x","Isotype","Error","Conc..ug.ml."), timevar = "Name", direction = "wide")
sadf$Error[sadf$Error = 0] <- NA
sadf$Conc..ug.ml.[sadf$Conc..ug.ml. = 0] <- NA
与dplyr
, group_by
患者,然后filter
到行(用于分组-由患者)满足条件。 如果第first
值超过20000,则该条件返回last
Value
,否则返回min
。
library(dplyr)
df %>% group_by(Patient) %>% filter(Value == ifelse(first(Value) > 20000,
last(Value),
min(Value)))
# Source: local data frame [5 x 3]
# Groups: Patient [5]
#
# Patient Dilution Value
# (fctr) (int) (int)
# 1 John 20 15000
# 2 George 20 700
# 3 Kelly 20 24000
# 4 Tom 20 30000
# 5 Diane 20 0
注意:此方法遵循问题的措辞,该措辞不会返回问题中的结果data.frame。 如果该条件应返回小于20000的第一个稀释度,则只需将min
更改为first
,就可以从问题中得到结果数据框:
df %>% group_by(Patient) %>% filter(Value == ifelse(first(Value) > 20000,
last(Value),
first(Value)))
# Source: local data frame [5 x 3]
# Groups: Patient [5]
#
# Patient Dilution Value
# (fctr) (int) (int)
# 1 John 20 15000
# 2 George 2 13000
# 3 Kelly 20 24000
# 4 Tom 20 30000
# 5 Diane 2 700
我们可以使用data.table
。 将'data.frame'转换为'data.table'( setDT(df)
),按'Patient'分组,我们使用if/else
条件对行进行子集化,如果存在else,则取min
'Value',否则得到last
一。
setDT(df1)[df1[ , .I[if(min(Value) <20000)
which.min(Value) else .N] , Patient]$V1]
# Patient Dilution Value
#1: John 20 15000
#2: George 20 700
#3: Kelly 20 24000
#4: Tom 20 30000
#5: Diane 20 0
如果条件基于第first
“值”,则需要将min(Value)
更改为first(Value)
或Value[1L]
,并使用1代替which.min
setDT(df1)[df1[ , .I[if(Value[1L] <20000)
1 else .N], Patient]$V1]
# Patient Dilution Value
#1: John 20 15000
#2: George 2 13000
#3: Kelly 20 24000
#4: Tom 20 30000
#5: Diane 2 700
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.