R data.table对组大小的过滤

Question

I am trying to find all the records in my data.table for which there is more than one row with value v in field f . 我试图在我的data.table找到所有记录，其中在字段f中存在多于一行的值v的记录 。

For instance, we can use this data: 例如，我们可以使用以下数据：

dt <- data.table(f1=c(1,2,3,4,5), f2=c(1,1,2,3,3))

If looking for that property in field f2 , we'd get (note the absence of the (3,2) tuple) 如果在字段f2查找该属性，我们会得到（注意没有（3,2）元组）

My first guess was dt[.N>2,list(.N),by=f2] , but that actually keeps entries with .N==1 . 我的第一个猜测是dt[.N>2,list(.N),by=f2] ，但实际上保留了.N==1条目。

dt[.N>2,list(.N),by=f2]
   f2 N
1:  1 2
2:  2 1
3:  3 2

The other easy guess, dt[duplicated(dt$f2)] , doesn't do the trick, as it keeps one of the 'duplicates' out of the results. 另一个简单的猜测dt[duplicated(dt$f2)]并不能解决问题，因为它使结果中没有“重复项”。

dt[duplicated(dt$f2)]
   f1 f2
1:  2  1
2:  5  3

So how can I get this done? 那我该怎么做呢？

Edited to add example 编辑添加示例

Answer 1

The question is not clear. 问题尚不清楚。 Based on the title, it looks like we want to extract all groups with number of rows ( .N ) greater than 1. 根据标题，我们似乎要提取行数（ .N ）大于1的所有组。

DT[, if(.N>1) .SD, by=f]

But the value v in field f is making it confusing. 但是value v in field f的value v in field f令人困惑。

Answer 2

If I understand what you're after correctly, you'll need to do some compound queries: 如果我正确理解了您的要求，则需要执行一些复合查询：

library(data.table)
DT <- data.table(v1 = 1:10, f = c(rep(1:3, 3), 4))
DT[, N := .N, f][N > 2][, N := NULL][]
#    v1 f
# 1:  1 1
# 2:  2 2
# 3:  3 3
# 4:  4 1
# 5:  5 2
# 6:  6 3
# 7:  7 1
# 8:  8 2
# 9:  9 3

R data.table对组大小的过滤

问题描述

2 个解决方案

解决方案1
9 已采纳 2015-12-23 02:59:36

解决方案2
3 2015-12-23 02:59:11

R data.table对组大小的过滤

问题描述

2 个解决方案

解决方案1 9 已采纳 2015-12-23 02:59:36

解决方案2 3 2015-12-23 02:59:11

解决方案1
9 已采纳 2015-12-23 02:59:36

解决方案2
3 2015-12-23 02:59:11