[英]group a column by values of second column based on condition from third column in data.table
I have a large data.table from which I want to group one column based on the value of another column that meets the criteria from a third column. 我有一个很大的data.table,我想根据另一个满足第三列条件的列的值将一个列分组。 I can do this with a loop but I wonder if it can be done in data.table?
我可以通过循环执行此操作,但我想知道是否可以在data.table中完成此操作?
The table looks like this: 该表如下所示:
Group Col1 Col2
1: A 1 0.0
2: A 2 0.1
3: A 3 0.2
4: A 4 0.5
5: A 5 0.9
6: B 6 0.0
7: B 7 0.2
8: B 8 0.4
9: B 9 0.9
10: B 10 1.0
What I need is for each Group get the value in the row of Col1 where Col2 is the closest to 0.5. 我需要的是每个组在Col1的行中获取值,其中Col2最接近0.5。 Col2 is a cumulative value that can range from 0 to 1. The expected result is:
Col2是一个范围为0到1的累积值。预期结果是:
Group Col1
1: A 4
2: B 8
Can this be done in data.table?. 可以在data.table中完成吗? I have struggle to do this so any input or guidance will be greatly appreciated.
我很难做到这一点,因此任何输入或指导都将不胜感激。 Here is data.table above
这是上面的data.table
DAT=data.table(Group=c(rep("A",5),rep("B",5)),Col1=1:10,Col2=c(0,.1,.2,.5,.9,0,.2,.4,.9,1))
After grouping by 'Group', take the absolute difference of 'Col2' with 0.5, get the index o the minimum value ( which.min
) and use that to subset the 'Col1' 按“分组”分组后,将“ Col2”的绝对差取为0.5,得到最小值的索引(
which.min
),并使用该值对“ Col1”进行子集化
DAT[, .(Col1 = Col1[which.min(abs(Col2 - 0.5))]), Group]
# Group Col1
#1: A 4
#2: B 8
What I need is for each Group get the value in the row of Col1 where Col2 is the closest to 0.5.
我需要的是每个组在Col1的行中获取值,其中Col2最接近0.5。
Use a rolling join: 使用滚动联接:
DAT[.(unique(Group), .5), on=.(Group, Col2), roll="nearest"]
# Group Col1 Col2
# 1: A 4 0.5
# 2: B 8 0.5
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.