data.table如何在键列上排序NA值？

Question

Using data.table, say I'm setting the key using two columns, and one of the columns has missing values. 使用data.table，假设我使用两列设置密钥，其中一列缺少值。 Data table seems to sort the NA values to the first values. 数据表似乎将NA值排序为第一个值。

require(data.table)
set.seed(919)

# Create sample data
dt <- data.table(
  key1 = rep(1:10, each = 10),
  key2 = rep_len(letters, 100)
  )

# Set some key2 values to missing
dt[sample(1:100, 10), "key2"] <- NA

# Set key (sort)
setkeyv(dt, c("key1", "key2"))
dt
# 1:    1   NA
# 2:    1    a
# 3:    1    b
# 4:    1    c
# 5:    1    d
# 6:    1    f
# 7:    1    g
# 8:    1    h
# 9:    1    i
# 10:    1    j
# 11:    2   NA
# 12:    2   NA
# 13:    2    k
# 14:    2    m
# 15:    2    n
# 16:    2    o
# 17:    2    p
# 18:    2    q
# 19:    2    r
# 20:    2    s
# 21:    3    a
# 22:    3    b
# 23:    3    c
# 24:    3    d
# 25:    3    u
# 26:    3    v
# 27:    3    w
# 28:    3    x
# 29:    3    y
# 30:    3    z
# 31:    4    e
# 32:    4    f
# 33:    4    g
# 34:    4    h
# 35:    4    i
# 36:    4    j
# 37:    4    k
# 38:    4    l
# 39:    4    m
# 40:    4    n
# 41:    5   NA
# 42:    5   NA
# 43:    5    o
# 44:    5    q
# 45:    5    r
# 46:    5    s
# 47:    5    u
# 48:    5    v
# 49:    5    w
# 50:    5    x
# 51:    6   NA
# 52:    6    a
# 53:    6    b
# 54:    6    c
# 55:    6    d
# 56:    6    e
# 57:    6    g
# 58:    6    h
# 59:    6    y
# 60:    6    z
# 61:    7    i
# 62:    7    j
# 63:    7    k
# 64:    7    l
# 65:    7    m
# 66:    7    n
# 67:    7    o
# 68:    7    p
# 69:    7    q
# 70:    7    r
# 71:    8   NA
# 72:    8   NA
# 73:    8    a
# 74:    8    b
# 75:    8    t
# 76:    8    u
# 77:    8    w
# 78:    8    x
# 79:    8    y
# 80:    8    z
# 81:    9   NA
# 82:    9    c
# 83:    9    d
# 84:    9    e
# 85:    9    f
# 86:    9    h
# 87:    9    i
# 88:    9    j
# 89:    9    k
# 90:    9    l
# 91:   10   NA
# 92:   10    m
# 93:   10    n
# 94:   10    o
# 95:   10    p
# 96:   10    r
# 97:   10    s
# 98:   10    t
# 99:   10    u
# 100:   10    v
# key1 key2

Does this always happen, or will I run into problems if I always assume this is true? 这总是会发生，或者如果我一直认为这是真的，我会遇到问题吗？

Answer 1

For setkey() , data.table behaves like base R sort(x, na.last=FALSE) , as the sort order (always increasing) is essential for binary search based joins/subsets. 对于setkey() ， data.table的行为类似于基本R sort(x, na.last=FALSE) ，因为排序顺序（总是增加）对于基于二进制搜索的连接/子集是必不可少的。 Rationale for NA s appearing first is that: NA首先出现的理由是：

"NAs are internally large negative number[s]" github.com/Rdatatable/data.table/issues/434 “NAs是内部大负数[s]” github.com/Rdatatable/data.table/issues/434

Miscellaneous comments: If you are just looking to reorder your data, you should consider setorder() , which is capable of sorting in any order and positioning NA s in the beginning or end. 杂项评论：如果您只是想重新排序数据，您应该考虑setorder() ，它能够按任意顺序排序并在开头或结尾定位NA 。

By the way, the standard syntax there is dt[sample(1:100, 10), key2 := NA] and you should watch out for mistaking the two-character string "NA" for NA (not a problem in your example). 顺便说一句，标准语法有dt[sample(1:100, 10), key2 := NA]你应该注意误将两个字符"NA"的NA （不是在你的例子有问题）。

data.table如何在键列上排序NA值？

问题描述

1 个解决方案

解决方案1
4 已采纳 2015-07-18 00:19:09

data.table如何在键列上排序NA值？

问题描述

1 个解决方案

解决方案1 4 已采纳 2015-07-18 00:19:09

解决方案1
4 已采纳 2015-07-18 00:19:09