[英]partially match a data.frame and subset all the data.frame
I have some data that looks like this: 我有一些看起来像这样的数据:
List_name Condition1 Condition2 Situation1 Situation2 List1 0.01 0.12 66 123 List2 0.23 0.22 45 -34 List3 0.32 0.23 13 -12 List4 0.03 0.56 -3 45 List5 0.56 0.05 12 100 List6 0.90 0.09 22 32
I would like to filter each column "Condition" of the data.frame according to a cut off 0.5. 我想根据截止值0.5过滤data.frame的每个列“Condition”。 After the filter, the subset will occur and will carry the corresponding value of columns "Situation".
在过滤器之后,子集将出现并将携带“情境”列的相应值。 Filter and subset will work pairwise: "Condition1" with "Situation1", "Condition2" with "Situation2" and so on.
过滤器和子集将成对工作:“Condition1”与“Situation1”,“Condition2”与“Situation2”等等。
Just the desired output: 只是想要的输出:
List_name Condition1 Situation1 List_name Condition2 Situation2 List1 0.01 66 List1 0.12 123 List2 0.23 45 List2 0.22 -34 List3 0.32 13 List3 0.23 -12 List4 0.03 -3 List5 0.05 100 List6 0.09 32
I'm pretty sure that there's probably another similar situation posted before but I searched and I didn't find it. 我很确定之前可能还有其他类似的情况,但我搜索过,但我没有找到它。
Similar to excellent @Arun solution, but based on columns names and without any assumption. 类似于优秀的@Arun解决方案,但基于列名称而没有任何假设。
cols.conds <- colnames(dat)[gregexpr(pattern='Condition[0-9]+',colnames(dat)) > 0]
lapply(cols.conds, function(x){
col.list <- colnames(dat)[1]
col.situ <- gsub('Condition','Situation',x)
dat[which(dat[[x]] < 0.5), c(col.list,x,col.situ)]}
)
I assume dat is : 我假设dat是:
dat <- read.table(text =' List_name Condition1 Condition2 Situation1 Situation2
List1 0.01 0.12 66 123
List2 0.23 0.22 45 -34
List3 0.32 0.23 13 -12
List4 0.03 0.56 -3 45
List5 0.56 0.05 12 100
List6 0.90 0.02 22 32',head=T)
You can use the notion that boolean checks are vectorized: 您可以使用布尔检查向量化的概念:
x <- c(0.1, 0.3, 0.5, 0.2)
x < 0.5
# [1] TRUE TRUE FALSE TRUE
And some grep
results: 还有一些
grep
结果:
grep('Condition', names(DF1))
To do this subsetting you can use apply
to generate your boolean vector: 要执行此子集化,您可以使用
apply
生成布尔向量:
keepers <- apply(DF1[, grep('Condition', names(DF1))], 1, function(x) any(x < 0.5))
And subset: 子集:
DF1[keepers,]
Notice that this doesn't necessarily return the data structure you showed in your question. 请注意,这不一定会返回您在问题中显示的数据结构。 But you can alter the anonymous function accordingly using
all
or a different threshold value. 但是您可以使用
all
或不同的阈值相应地更改匿名函数。
In lieu of the edits, I would approach this differently. 代替编辑,我会采用不同的方法。 I would use
melt
from the reshape2
package: 我会用
melt
从reshape2
包:
library(reshape2)
dat.c <- melt(DF1,
id.var='List_name',
measure.var=grep('Condition', names(DF1), value=TRUE),
variable.name='condition',
value.name='cond.val')
dat.c$idx <- gsub('Condition', '', dat.c$condition)
dat.s <- melt(DF1,
id.var='List_name',
measure.var=grep('Situation', names(DF1), value=TRUE),
variable.name='situation',
value.name='situ.val')
dat.s$idx <- gsub('Situation', '', dat.s$situation)
dat <- merge(dat.c, dat.s)
out <- dat[dat$cond.val < 0.5,]
List_name idx condition cond.val situation situ.val
1 List1 1 Condition1 0.01 Situation1 66
2 List1 2 Condition2 0.12 Situation2 123
3 List2 1 Condition1 0.23 Situation1 45
4 List2 2 Condition2 0.22 Situation2 -34
5 List3 1 Condition1 0.32 Situation1 13
6 List3 2 Condition2 0.23 Situation2 -12
7 List4 1 Condition1 0.03 Situation1 -3
10 List5 2 Condition2 0.05 Situation2 100
12 List6 2 Condition2 0.09 Situation2 32
You can then use dcast
to put the data back in the initial format if you want, but I find data in this "long" form much easier to work with. 然后,如果需要,您可以使用
dcast
将数据放回初始格式,但我发现这种“长”形式的数据更容易使用。 This form is also pleasant since it avoids the need for NA values where you have rows where one condition is met and others are not. 这种形式也很令人愉快,因为它避免了NA值的需要,其中有行满足一个条件而其他条件不满足。
out.c <- dcast(out, List_name ~ condition, value.var='cond.val')
out.s <- dcast(out, List_name ~ situation, value.var='situ.val')
merge(out.c, out.s)
List_name Condition1 Condition2 Situation1 Situation2
1 List1 0.01 0.12 66 123
2 List2 0.23 0.22 45 -34
3 List3 0.32 0.23 13 -12
4 List4 0.03 NA -3 NA
5 List5 NA 0.05 NA 100
6 List6 NA 0.09 NA 32
I think what you're asking for is attainable, but it can't be bind
(bound) in the way you've shown as they have unequal elements. 我认为你所要求的是可以实现的,但它不能以你所展示的方式
bind
(绑定),因为它们具有不相等的元素。 So, you'll get a list. 所以,你会得到一份清单。
Here, I assume that your data.frame
always is of the form List_name
, followed by a list of Condition1
, ... , ConditionN
and then Situation1
, ..., SituationN
. 在这里,我假设你
data.frame
始终是形式的List_name
,随后的列表Condition1
,..., ConditionN
然后Situation1
,..., SituationN
。
Then, this can be obtained by getting the ids
first and then filtering using lapply
然后,这可以通过首先获取
ids
然后使用lapply
过滤来lapply
ids <- grep("Condition", names(df))
lapply(ids, function(x) df[which(df[[x]] < 0.5), c(1,x,x+length(ids))])
# [[1]]
# List_name Condition1 Situation1
# 1 List1 0.01 66
# 2 List2 0.23 45
# 3 List3 0.32 13
# 4 List4 0.03 -3
#
# [[2]]
# List_name Condition2 Situation2
# 1 List1 0.12 123
# 2 List2 0.22 -34
# 3 List3 0.23 -12
# 5 List5 0.05 100
# 6 List6 0.09 32
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.