简体   繁体   English

部分匹配data.frame和子集所有data.frame

[英]partially match a data.frame and subset all the data.frame

I have some data that looks like this: 我有一些看起来像这样的数据:

  List_name Condition1 Condition2 Situation1 Situation2 List1 0.01 0.12 66 123 List2 0.23 0.22 45 -34 List3 0.32 0.23 13 -12 List4 0.03 0.56 -3 45 List5 0.56 0.05 12 100 List6 0.90 0.09 22 32 

I would like to filter each column "Condition" of the data.frame according to a cut off 0.5. 我想根据截止值0.5过滤data.frame的每个列“Condition”。 After the filter, the subset will occur and will carry the corresponding value of columns "Situation". 在过滤器之后,子集将出现并将携带“情境”列的相应值。 Filter and subset will work pairwise: "Condition1" with "Situation1", "Condition2" with "Situation2" and so on. 过滤器和子集将成对工作:“Condition1”与“Situation1”,“Condition2”与“Situation2”等等。

Just the desired output: 只是想要的输出:

  List_name Condition1 Situation1 List_name Condition2 Situation2 List1 0.01 66 List1 0.12 123 List2 0.23 45 List2 0.22 -34 List3 0.32 13 List3 0.23 -12 List4 0.03 -3 List5 0.05 100 List6 0.09 32 

I'm pretty sure that there's probably another similar situation posted before but I searched and I didn't find it. 我很确定之前可能还有其他类似的情况,但我搜索过,但我没有找到它。

Similar to excellent @Arun solution, but based on columns names and without any assumption. 类似于优秀的@Arun解决方案,但基于列名称而没有任何假设。

cols.conds <- colnames(dat)[gregexpr(pattern='Condition[0-9]+',colnames(dat)) > 0]
lapply(cols.conds, function(x){
   col.list <- colnames(dat)[1]
   col.situ <- gsub('Condition','Situation',x)
   dat[which(dat[[x]] < 0.5), c(col.list,x,col.situ)]}
)

I assume dat is : 我假设dat是:

dat <- read.table(text =' List_name     Condition1   Condition2  Situation1   Situation2
  List1          0.01         0.12         66           123
  List2          0.23         0.22         45           -34
  List3          0.32         0.23         13           -12
  List4          0.03         0.56         -3            45
  List5          0.56         0.05         12           100
  List6          0.90         0.02         22            32',head=T)

You can use the notion that boolean checks are vectorized: 您可以使用布尔检查向量化的概念:

x <- c(0.1, 0.3, 0.5, 0.2)
x < 0.5
# [1]  TRUE  TRUE FALSE  TRUE

And some grep results: 还有一些grep结果:

grep('Condition', names(DF1))

To do this subsetting you can use apply to generate your boolean vector: 要执行此子集化,您可以使用apply生成布尔向量:

keepers <- apply(DF1[, grep('Condition', names(DF1))], 1, function(x) any(x < 0.5))

And subset: 子集:

DF1[keepers,]

Notice that this doesn't necessarily return the data structure you showed in your question. 请注意,这不一定会返回您在问题中显示的数据结构。 But you can alter the anonymous function accordingly using all or a different threshold value. 但是您可以使用all或不同的阈值相应地更改匿名函数。


In lieu of the edits, I would approach this differently. 代替编辑,我会采用不同的方法。 I would use melt from the reshape2 package: 我会用meltreshape2包:

library(reshape2)
dat.c <- melt(DF1, 
              id.var='List_name', 
              measure.var=grep('Condition', names(DF1), value=TRUE),
              variable.name='condition',
              value.name='cond.val')
dat.c$idx <- gsub('Condition', '', dat.c$condition)
dat.s <- melt(DF1, 
              id.var='List_name', 
              measure.var=grep('Situation', names(DF1), value=TRUE),
              variable.name='situation',
              value.name='situ.val')
dat.s$idx <- gsub('Situation', '', dat.s$situation)
dat <- merge(dat.c, dat.s)

out <- dat[dat$cond.val < 0.5,]

   List_name idx  condition cond.val  situation situ.val
1      List1   1 Condition1     0.01 Situation1       66
2      List1   2 Condition2     0.12 Situation2      123
3      List2   1 Condition1     0.23 Situation1       45
4      List2   2 Condition2     0.22 Situation2      -34
5      List3   1 Condition1     0.32 Situation1       13
6      List3   2 Condition2     0.23 Situation2      -12
7      List4   1 Condition1     0.03 Situation1       -3
10     List5   2 Condition2     0.05 Situation2      100
12     List6   2 Condition2     0.09 Situation2       32

You can then use dcast to put the data back in the initial format if you want, but I find data in this "long" form much easier to work with. 然后,如果需要,您可以使用dcast将数据放回初始格式,但我发现这种“长”形式的数据更容易使用。 This form is also pleasant since it avoids the need for NA values where you have rows where one condition is met and others are not. 这种形式也很令人愉快,因为它避免了NA值的需要,其中有行满足一个条件而其他条件不满足。

out.c <- dcast(out, List_name ~ condition, value.var='cond.val')
out.s <- dcast(out, List_name ~ situation, value.var='situ.val')
merge(out.c, out.s)

  List_name Condition1 Condition2 Situation1 Situation2
1     List1       0.01       0.12         66        123
2     List2       0.23       0.22         45        -34
3     List3       0.32       0.23         13        -12
4     List4       0.03         NA         -3         NA
5     List5         NA       0.05         NA        100
6     List6         NA       0.09         NA         32

I think what you're asking for is attainable, but it can't be bind (bound) in the way you've shown as they have unequal elements. 我认为你所要求的是可以实现的,但它不能以你所展示的方式bind (绑定),因为它们具有不相等的元素。 So, you'll get a list. 所以,你会得到一份清单。

Here, I assume that your data.frame always is of the form List_name , followed by a list of Condition1 , ... , ConditionN and then Situation1 , ..., SituationN . 在这里,我假设你data.frame始终是形式的List_name ,随后的列表Condition1 ,..., ConditionN然后Situation1 ,..., SituationN

Then, this can be obtained by getting the ids first and then filtering using lapply 然后,这可以通过首先获取ids然后使用lapply过滤来lapply

ids <- grep("Condition", names(df))
lapply(ids, function(x) df[which(df[[x]] < 0.5), c(1,x,x+length(ids))])

# [[1]]
#   List_name Condition1 Situation1
# 1     List1       0.01         66
# 2     List2       0.23         45
# 3     List3       0.32         13
# 4     List4       0.03         -3
# 
# [[2]]
#   List_name Condition2 Situation2
# 1     List1       0.12        123
# 2     List2       0.22        -34
# 3     List3       0.23        -12
# 5     List5       0.05        100
# 6     List6       0.09         32

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM