简体   繁体   English

R-从单个列中选择满足多个条件的ID

[英]R - Select IDs that meet multiple criteria from single column

I have a list of IDs, each having multiple events. 我有一个ID列表,每个ID都有多个事件。 The data looks like an event log, ie one event per ID per row. 数据看起来像一个事件日志,即每行每个ID一个事件。 For example: 例如:

n.ID=4
n.events=5
set.seed(1234)
df <- setNames(melt(replicate(n.ID, sort(sample(letters[c(1:10)], n.events))))[c(2:3)], c("ID", "Event"))
df

    > df
   ID Event
1   1     b
2   1     e
3   1     f
4   1     h
5   1     i
6   2     a
7   2     b
8   2     d
9   2     e
10  2     g
11  3     b
12  3     c
13  3     e
14  3     g
15  3     j
16  4     b
17  4     c
18  4     g
19  4     i
20  4     j

I want to select those IDs, that meet a list of criteria, that either use AND or OR. 我想选择那些符合条件列表且使用AND或OR的ID。

For example: 例如:

  1. those IDs that have events "b" AND "c" AND "g" --> results in ID 3 & 4 那些具有事件“ b”和“ c”和“ g”的ID –>结果为ID 3和4
  2. those IDs that have events "a" OR "h" --> results in ID 1 & 2 那些具有事件“ a”或“ h”的ID->结果为ID 1和2

The criteria vectors can be any length. 标准向量可以是任何长度。

EDIT: 编辑:

I am aware of %in% and "|", however, 我知道%in%和“ |”,

keep.if <- c("b", "c", "g") # This list can be of any length
subset(df, Event %in% keep.if)
ID Event
1   1     b
7   2     b
10  2     g
11  3     b
12  3     c
14  3     g
16  4     b
17  4     c
18  4     g

I only want those that have 3 rows in the results, so i can do a table on this results, and select those IDs where the Freq == length(keep.if)... but I guess there should be an easier, less messy method... 我只希望结果中包含3行的数据,因此我可以对此结果进行表格处理,并选择其中Freq == length(keep.if)的ID,但是我想应该更简单,更少凌乱的方法...

I guess for the OR version I can just take: 我想我可以拿的OR版本:

unique(subset(df, Event %in% keep.if)$ID)

I would create a table then use tidyr::spread to create a contigency table type object. 我将创建一个table然后使用tidyr::spread创建一个偶发表类型对象。 Then I would use data.table for easier sub setting and logical operations: 然后,我将使用data.tabledata.table子设置和逻辑操作:

library(tidyr)

df.table<-as.data.frame(table(df)) %>% spread(Event, Freq)
df.table

ID a b c d e f g h i j
1 0 1 0 0 1 1 0 1 1 0
2 1 1 0 1 1 0 1 0 0 0
3 0 1 1 0 1 0 1 0 0 1
4 0 1 1 0 0 0 1 0 1 1

library(data.table)
##easier to subset with

df.table<-data.table(df.table)
df.table[b & c & g]

ID a b c d e f g h i j
3 0 1 1 0 1 0 1 0 0 1
4 0 1 1 0 0 0 1 0 1 1

df.table[a | h]

ID a b c d e f g h i j
1 0 1 0 0 1 1 0 1 1 0
2 1 1 0 1 1 0 1 0 0 0

Those are the 2 examples you gave in the question. 这些是您在问题中给出的2个示例。 You should be able to do just about any operation you want. 您应该几乎可以执行所需的任何操作。 Also, if you only want to know which IDs satisfy your logic (and not their entire contingency table) then: 另外,如果您只想知道哪些ID符合您的逻辑(而不是它们的整个列联表),则:

df.table[b & c & g]$ID
[1] 3 4

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM