简体   繁体   English

data.table按组分配向量

[英]data.table assign a vector by group

Let's say we have the following data.table 假设我们有以下data.table

dt = data.table(a=letters[1:20], b = c(rep(1,3),rep(2,7),rep(3,5),rep(4,5)))

that is 那是

    a b
 1: a 1
 2: b 1
 3: c 1
 4: d 2
 5: e 2
 6: f 2
 7: g 2
 8: h 2
 9: i 2
10: j 2
11: k 3
12: l 3
13: m 3
14: n 3
15: o 3
16: p 4
17: q 4
18: r 4
19: s 4
20: t 4

and that I want to assign a rank from 0 to 1 to each row but grouping by column b. 我想为每行分配一个从0到1的等级,但按b列分组。 I'm doing 我正在做

dt[,len:=.N,by=b][,rank:=c(0:(len-1))/(len-1),by=b][,len:=NULL]

where len is there just to calculate the rank and then is removed. len在那里只是为了计算等级,然后被删除。 I obtain 我得到

    a b      rank
 1: a 1 0.0000000
 2: b 1 0.5000000
 3: c 1 1.0000000
 4: d 2 0.0000000
 5: e 2 0.1666667
 6: f 2 0.3333333
 7: g 2 0.5000000
 8: h 2 0.6666667
 9: i 2 0.8333333
10: j 2 1.0000000
11: k 3 0.0000000
12: l 3 0.2500000
13: m 3 0.5000000
14: n 3 0.7500000
15: o 3 1.0000000
16: p 4 0.0000000
17: q 4 0.2500000
18: r 4 0.5000000
19: s 4 0.7500000
20: t 4 1.0000000

which is exactly what i want. 这正是我想要的。 The problem is that I get also this 问题是我也得到这个

   Warning messages:
 1: In base::":"(from, to) :
  numerical expression has 3 elements: only the first used
 2: In base::":"(from, to) :
  numerical expression has 7 elements: only the first used
 3: In base::":"(from, to) :
  numerical expression has 5 elements: only the first used
 4: In base::":"(from, to) :
  numerical expression has 5 elements: only the first used

I would like to disregard them, and that's fine when the data is small and I can check by sight the result. 我想不理会它们,当数据很小并且我可以目视检查结果时,这很好。 But since my data.table has thousands of rows, I would like to be sure that these warnings are actually harmless. 但是由于我的data.table具有数千行,因此我想确保这些警告实际上是无害的。

What do you think? 你怎么看? Or, equivalently, is my method for assigning a 'vector' by grouping in a data.table allowed? 或者,等效地,我通过分组data.table分配“向量”的方法是否被允许? Are there alternatives? 有其他选择吗?

thank you 谢谢

You are getting the warning from this portion of the code: 0:(len-1) . 您将从以下代码部分获得警告: 0:(len-1) The second argument to : , len-1 is a vector of length .N , but : wants a vector of length 1. You can recreate the warning with (1:2):(2:3) or with seq_len(2):seq_len(2) . :len-1的第二个参数是长度为.N的向量,但:想要长度为1的向量。您可以使用(1:2):(2:3)seq_len(2):seq_len(2)重新创建警告seq_len(2):seq_len(2)

The following will calculate what you want in one line without said warning: 下面将在一行中计算您想要的内容,而不会发出警告:

dt[, rank := (seq_len(.N) - 1) / (.N - 1), by=b]
dt
    a b      rank
 1: a 1 0.0000000
 2: b 1 0.5000000
 3: c 1 1.0000000
 4: d 2 0.0000000
 5: e 2 0.1666667
 6: f 2 0.3333333
 7: g 2 0.5000000
 8: h 2 0.6666667
 9: i 2 0.8333333
10: j 2 1.0000000
11: k 3 0.0000000
12: l 3 0.2500000
13: m 3 0.5000000
14: n 3 0.7500000
15: o 3 1.0000000
16: p 4 0.0000000
17: q 4 0.2500000
18: r 4 0.5000000
19: s 4 0.7500000
20: t 4 1.0000000

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM