简体   繁体   English

使用条件对数据框和子集中的列进行分组

[英]Group columns in a dataframe and subset using a condition

I have a dataframe like this 我有这样的数据帧

ID <- c("ID001","ID001","ID001","ID001","ID001","ID001","ID001",
        "ID002","ID002","ID002","ID002","ID002")
Type <- c("A","A","A","A","A","A","A",
          "B","B","B","B","B")
Measurement <- c("Length","Summary","Breadth","Length","Summary","Breadth","Summary",
                 "Length","Summary","Breadth","Breadth","Summary")
PassFail <- c("PASS","PASS","PASS","FAIL_PTS","FAIL","FAIL_AVG_HI","FAIL",
              "PASS","FAIL_PTS","FAIL","FAIL_AVG_LOW","FAIL")
ToolID <- c("SWP","SWP","SWP","ISP","ISP","IKS","IKS",
            "PSX","PSX","PSX","PZY","PZY")

df <- data.frame(ID,Type,Measurement,PassFail,ToolID)
df

      ID Type Measurement     PassFail ToolID
   ID001    A      Length         PASS    SWP
   ID001    A     Summary         PASS    SWP
   ID001    A     Breadth         PASS    SWP
   ID001    A      Length     FAIL_PTS    ISP
   ID001    A     Summary         FAIL    ISP
   ID001    A     Breadth  FAIL_AVG_HI    IKS
   ID001    A     Summary         FAIL    IKS
   ID002    B      Length         PASS    PSX
   ID002    B     Summary     FAIL_PTS    PSX
   ID002    B     Breadth         FAIL    PSX
   ID002    B     Breadth FAIL_AVG_LOW    PZY
   ID002    B     Summary         FAIL    PZY

I am trying to subset this data frame using a condition like this: When the passfail = 'FAIL_AVG_HI' or 'FAIL_AVG_LOW', I would like to remove the rows in that group (ID,Type,ToolID). 我试图使用如下条件对此数据框进行子集化:当passfail ='FAIL_AVG_HI'或'FAIL_AVG_LOW'时,我想删除该组中的行(ID,Type,ToolID)。

My desired output would look like this 想要的输出看起来像这样

     ID Type Measurement PassFail ToolID
  ID001    A      Length     PASS    SWP
  ID001    A     Summary     PASS    SWP
  ID001    A     Breadth     PASS    SWP
  ID001    A      Length FAIL_PTS    ISP
  ID001    A     Summary     FAIL    ISP
  ID002    B      Length     PASS    PSX
  ID002    B     Summary FAIL_PTS    PSX
  ID002    B     Breadth     FAIL    PSX

I am messing it up with the grouping to remove the rows. 我正在搞乱分组以删除行。 I can delete the row that has the above passfail values but how to group them and delete any rows that belong to the group? 我可以删除具有上述passfail值的行但是如何对它们进行分组并删除属于该组的所有行?

I am doing it this way to remove 1 row 我这样做是为了删除1行

df <- subset(df,df$PassFail != 'FAIL_AVG_HI' | df$PassFail != 'FAIL_AVG_LOW')

You can use group_by %>% filter : 您可以使用group_by %>% filter

library(dplyr)
df %>% 
      group_by(ID, Type, ToolID) %>% 
      filter(!any(PassFail %in% c('FAIL_AVG_HI', 'FAIL_AVG_LOW')))

#Source: local data frame [8 x 5]
#Groups: ID, Type, ToolID [3]

#      ID   Type Measurement PassFail ToolID
#  <fctr> <fctr>      <fctr>   <fctr> <fctr>
#1  ID001      A      Length     PASS    SWP
#2  ID001      A     Summary     PASS    SWP
#3  ID001      A     Breadth     PASS    SWP
#4  ID001      A      Length FAIL_PTS    ISP
#5  ID001      A     Summary     FAIL    ISP
#6  ID002      B      Length     PASS    PSX
#7  ID002      B     Summary FAIL_PTS    PSX
#8  ID002      B     Breadth     FAIL    PSX

We can use data.table 我们可以使用data.table

library(data.table)
setDT(df)[, if(!any(PassFail %in% c('FAIL_AVG_HI', 'FAIL_AVG_LOW'))) 
                  .SD, .(ID, Type, ToolID)]
#       ID Type ToolID Measurement PassFail
#1: ID001    A    SWP      Length     PASS
#2: ID001    A    SWP     Summary     PASS
#3: ID001    A    SWP     Breadth     PASS
#4: ID001    A    ISP      Length FAIL_PTS
#5: ID001    A    ISP     Summary     FAIL  
#6: ID002    B    PSX      Length     PASS
#7: ID002    B    PSX     Summary FAIL_PTS
#8: ID002    B    PSX     Breadth     FAIL

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM