简体   繁体   English

R:按组,检查是否对于一个var的每个唯一值,至少有一个观察值,其中var的值等于另一个var的值

[英]R: By group, check if for each unique value of one var, there is at least one observation where the value of the var equals the value of another var

I think I am on the right direction with this code, but I am not quite there yet. 我认为我的代码方向正确,但我还没到那里。

I tried finding something useful on Google and SE, but I did not seem to be able to formulate the question in a way that gets me the answer I am looking for. 我尝试在Google和SE上找到一些有用的东西,但我似乎无法以一种让我得到我正在寻找的答案的方式来表达问题。

I could write a for-loop for this, comparing for each id and for each unique value of a per row, but I strive to achieve a higher level of R-understanding and thus want to avoid loops. 可以写一个for循环为此,每个ID每行每一个独特的值进行比较,但我力争实现R-理解更高的水平,从而希望避免环路。

id <- c(1,1,1,2,2,2,3,3,3,4,4,4,5,5,5)
a <- c(1,1,1,2,2,2,3,3,4,4,4,5,5,5,6)
b <- c(1,2,3,3,3,4,3,4,5,4,4,5,6,7,8)

require(data.table)
dt <- data.table(id, a, b)

dt
dt[,unique(a) %in% b, by=id]
tmp <- dt[,unique(a) %in% b, by=id]
tmp$id[tmp$V1 == FALSE]

In my example, IDs 2, 3 and 5 should be the result, the decision rule being: "By id , check if for each unique value of a if there is at least one observation where the value of b equals value of a ." 在我的例子中, 编号 2,3和5应该是结果,决策规则是:“根据ID,如果有至少一个观察其中b的值等于是否为一个每个独特的价值。”

However, my code only outputs IDs 2 and 5, but not 3. This is because for ID 3, the 4 is matched with the 4 of the previous observation. 然而,我的代码仅输出ID的 2和5,而不是3。这是因为对于ID 3,4与以前的观察4匹配。

The result should either output the IDs for which the condition is not met, or add a dummy variable to the original table that indicated whether the condition is met for the ID. 结果应该输出不满足条件的ID,或者向原始表添加一个虚拟变量,指示ID是否满足条件。

How about 怎么样

dt[, all(sapply(unique(a), function(i) any(a == i & b == i))), by = id]

#   id    V1
#1:  1  TRUE
#2:  2 FALSE
#3:  3 FALSE
#4:  4  TRUE
#5:  5 FALSE

If you want to add a dummy variable to the original table, you can modify it like 如果要将虚拟变量添加到原始表中,可以像修改它一样进行修改

dt[, check:=all(sapply(unique(a), function(i) any(a == i & b == i))), by = id]

I wondered if I can find are more data.table-esk solution for this old question using the enhanced join capabilities which were introduced to data.table in version 1.9.6 (on CRAN 19 Sep 2015). 我想知道我是否可以使用增强的连接功能找到更多的data.table-esk解决方案,这些连接功能是在版本1.9.6中引入data.table (2015年9月19日CRAN)。 With that version, data.table has gained the ability to join without having to set keys by using the on argument. 使用该版本, data.table已经获得了连接的能力,而无需使用on参数设置键。

Variant 1 变式1

dt[a == b][dt[, unique(a), by = id], on = .(id, a == V1)][is.na(b), unique(id)]
 [1] 2 3 5 

First, the rows of dt where a and b are equal are selected. 首先,选择ab相等的dt行。 Only these rows are right joined with the unique values of a for each id . 只有这些行与每个id的唯一值a连接在一起。 The result of the join is 连接的结果是

dt[a == b][dt[, unique(a), by = id], on = .(id, a == V1)]
  id ab 1: 1 1 1 2: 2 2 NA 3: 3 3 3 4: 3 4 NA 5: 4 4 4 6: 4 4 4 7: 4 5 5 8: 5 5 NA 9: 5 6 NA 

The NA values in column b indicate that no match is found. b列中的NA值表示未找到匹配项。 Any id which has an NA value indicates that OP's condition is not met. 任何具有NA值的id表示不满足OP的条件。

Variant 2 变体2

dt[dt[, unique(a), by = id], on = .(id, a == V1, b == V1), unique(id[is.na(x.a)])]
 [1] 2 3 5 

This variant right joins dt (unfiltered!) with the unique values of a for each id but the join conditions require matches in id as well as matches in a and b . 这种变异右连接dt (未过滤!)用的唯一值a对每个id ,但连接条件需要在比赛id在以及匹配ab (This resembles the a == i & b == i expression in konvas' accepted answer . Finally, those id s are returned which have at least one NA value in the join result indicating a missing match. (这类似于konvas'接受的答案中a == i & b == i表达式。最后,返回那些在连接结果中至少有一个NA值表示缺少匹配的id

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 检查对于 var1 的每个唯一值,是否有一个观察值在 R 中按组 (var4) 等于 var2 或 var3 - Check if for each unique value of var1, there is one observation where its value equals either var2 or var3 by group (var4) in R 为什么不能在`dcast`中有几个`value.var`? - Why can't one have several `value.var` in `dcast`? 使用R估计风险滚动值(VaR) - Estimation of rolling Value at Risk (VaR) using R 我正在尝试制作一个频率表,其中 Var1 仅使用一个值,而 Var3 未出现在表中,但过滤表中的数据 - I'm trying to make a frequency table where Var1 only uses one value and Var3 does not appear in the table but filters data in the table 更改属于同一组的行的值并且该组中至少有一个观察符合 dplyr 的条件 - Changing value of rows that belong to same group and at least one observation in the group meets a condition with dplyr 在 dcast() 参数“value.var”上 - on dcast() argument “value.var” 按一列分组,然后检查R中另一列的值 - Group by one column and check for value in another column in R 在 r 中,按 var 分组并按条件按其他 var 过滤 - Group by a var and filter by an other var on condition, in r Select 唯一条目显示来自另一列的至少一个值 - Select unique entries showing at least one value from another column 使用dcast时如何获取每个ID的value.var? - How to get the value.var for each ID when using dcast?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM