简体   繁体   English

R data.table 按组排序,每组底部有“other”

[英]R data.table sorting by group with "other" at bottom of each group

I can't quite get the syntax right for this.我不能完全得到正确的语法。 I have a data.table where I would like to sort first by a grouping column g1 (ordered factor), then in descending order by another column n .我有一个data.table ,我想首先按分组列g1 (有序因子)排序,然后按另一列n降序排序。 The only catch is that I would like rows labeled "other" for a third column g2 to appear at the bottom of each group, regardless of their value of n .唯一的问题是我希望第三列g2标记为“other”的行出现在每个组的底部,而不管它们的n值如何。

Example:例子:

library(data.table)

dt <- data.table(g1 = factor(rep(c('Australia', 'Mexico', 'Canada'), 3), levels = c('Australia', 'Canada', 'Mexico')),
                 g2 = rep(c('stuff', 'things', 'other'), each = 3),
                 n = c(1000, 2000, 3000, 5000, 100, 3500, 10000, 10000, 0))

This is the expected output, where within each g1 , we have descending order of n except that rows where g2 == 'other' are always at the bottom:这是预期的输出,在每个g1 ,除了g2 == 'other'总是在底部的行之外,我们有n降序:

         g1     g2     n
1: Australia things  5000
2: Australia  stuff  1000
3: Australia  other 10000
4:    Canada things  3500
5:    Canada  stuff  3000
6:    Canada  other     0
7:    Mexico  stuff  2000
8:    Mexico things   100
9:    Mexico  other 10000

Take advantage of data.table::order and its - -reverse ordering:利用data.table::order及其-反向排序:

dt[order(g1, g2 == "other", -n), ]
#           g1     g2     n
#       <fctr> <char> <num>
# 1: Australia things  5000
# 2: Australia  stuff  1000
# 3: Australia  other 10000
# 4:    Canada things  3500
# 5:    Canada  stuff  3000
# 6:    Canada  other     0
# 7:    Mexico  stuff  2000
# 8:    Mexico things   100
# 9:    Mexico  other 10000

We add g2 == "other" because you said that "other" should always be last.我们添加g2 == "other"因为你说 "other" 应该总是最后。 If, for example, "stuff" was "abc" , then we can see the difference in behavior:例如,如果"stuff""abc" ,那么我们可以看到行为的差异:

dt[ g2 == "stuff", g2 := "abc" ]
dt[order(g1, -n), ]
#           g1     g2     n
#       <fctr> <char> <num>
# 1: Australia  other 10000
# 2: Australia things  5000
# 3: Australia    abc  1000
# 4:    Canada things  3500
# 5:    Canada    abc  3000
# 6:    Canada  other     0
# 7:    Mexico  other 10000
# 8:    Mexico    abc  2000
# 9:    Mexico things   100

dt[order(g1, g2 == "other", -g2), ]
#           g1     g2     n
#       <fctr> <char> <num>
# 1: Australia things  5000
# 2: Australia    abc  1000
# 3: Australia  other 10000
# 4:    Canada things  3500
# 5:    Canada    abc  3000
# 6:    Canada  other     0
# 7:    Mexico things   100
# 8:    Mexico    abc  2000
# 9:    Mexico  other 10000

One disadvantage of this is that setorder doesn't work directly:这样做的一个缺点是setorder不能直接工作:

setorder(dt, g1, g2 == "other", -n)
# Error in setorderv(x, cols, order, na.last) : 
#   some columns are not in the data.table: ==,other

so we instead need to reorder and reassign back to dt .所以我们需要重新排序并重新分配回dt

BTW: this works because g2 == "other" resolves to logical , yes, but in sorting those are treated as 0 (false) and 1 (true), ergo false-conditions will appear before true-conditions.顺便说一句:这是有效的,因为g2 == "other"解析为logical ,是的,但是在对它们进行排序时,它们被视为0 (假)和1 (真),因此假条件将出现在真条件之前。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM