[英]R data.table sorting by group with "other" at bottom of each group
I can't quite get the syntax right for this.我不能完全得到正确的语法。 I have a
data.table
where I would like to sort first by a grouping column g1
(ordered factor), then in descending order by another column n
.我有一个
data.table
,我想首先按分组列g1
(有序因子)排序,然后按另一列n
降序排序。 The only catch is that I would like rows labeled "other" for a third column g2
to appear at the bottom of each group, regardless of their value of n
.唯一的问题是我希望第三列
g2
标记为“other”的行出现在每个组的底部,而不管它们的n
值如何。
Example:例子:
library(data.table)
dt <- data.table(g1 = factor(rep(c('Australia', 'Mexico', 'Canada'), 3), levels = c('Australia', 'Canada', 'Mexico')),
g2 = rep(c('stuff', 'things', 'other'), each = 3),
n = c(1000, 2000, 3000, 5000, 100, 3500, 10000, 10000, 0))
This is the expected output, where within each g1
, we have descending order of n
except that rows where g2 == 'other'
are always at the bottom:这是预期的输出,在每个
g1
,除了g2 == 'other'
总是在底部的行之外,我们有n
降序:
g1 g2 n
1: Australia things 5000
2: Australia stuff 1000
3: Australia other 10000
4: Canada things 3500
5: Canada stuff 3000
6: Canada other 0
7: Mexico stuff 2000
8: Mexico things 100
9: Mexico other 10000
Take advantage of data.table::order
and its -
-reverse ordering:利用
data.table::order
及其-
反向排序:
dt[order(g1, g2 == "other", -n), ]
# g1 g2 n
# <fctr> <char> <num>
# 1: Australia things 5000
# 2: Australia stuff 1000
# 3: Australia other 10000
# 4: Canada things 3500
# 5: Canada stuff 3000
# 6: Canada other 0
# 7: Mexico stuff 2000
# 8: Mexico things 100
# 9: Mexico other 10000
We add g2 == "other"
because you said that "other" should always be last.我们添加
g2 == "other"
因为你说 "other" 应该总是最后。 If, for example, "stuff"
was "abc"
, then we can see the difference in behavior:例如,如果
"stuff"
是"abc"
,那么我们可以看到行为的差异:
dt[ g2 == "stuff", g2 := "abc" ]
dt[order(g1, -n), ]
# g1 g2 n
# <fctr> <char> <num>
# 1: Australia other 10000
# 2: Australia things 5000
# 3: Australia abc 1000
# 4: Canada things 3500
# 5: Canada abc 3000
# 6: Canada other 0
# 7: Mexico other 10000
# 8: Mexico abc 2000
# 9: Mexico things 100
dt[order(g1, g2 == "other", -g2), ]
# g1 g2 n
# <fctr> <char> <num>
# 1: Australia things 5000
# 2: Australia abc 1000
# 3: Australia other 10000
# 4: Canada things 3500
# 5: Canada abc 3000
# 6: Canada other 0
# 7: Mexico things 100
# 8: Mexico abc 2000
# 9: Mexico other 10000
One disadvantage of this is that setorder
doesn't work directly:这样做的一个缺点是
setorder
不能直接工作:
setorder(dt, g1, g2 == "other", -n)
# Error in setorderv(x, cols, order, na.last) :
# some columns are not in the data.table: ==,other
so we instead need to reorder and reassign back to dt
.所以我们需要重新排序并重新分配回
dt
。
BTW: this works because g2 == "other"
resolves to logical
, yes, but in sorting those are treated as 0
(false) and 1
(true), ergo false-conditions will appear before true-conditions.顺便说一句:这是有效的,因为
g2 == "other"
解析为logical
,是的,但是在对它们进行排序时,它们被视为0
(假)和1
(真),因此假条件将出现在真条件之前。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.