简体   繁体   English

使用data.table中的列名选择多个列范围

[英]Select multiple ranges of columns using column names in data.table

Let say I have a data table, 假设我有一个数据表,

dt = data.table(matrix(1:50, nrow = 5));
colnames(dt) = letters[1:10];

> dt
   a  b  c  d  e  f  g  h  i  j
1: 1  6 11 16 21 26 31 36 41 46
2: 2  7 12 17 22 27 32 37 42 47
3: 3  8 13 18 23 28 33 38 43 48
4: 4  9 14 19 24 29 34 39 44 49
5: 5 10 15 20 25 30 35 40 45 50

I want to select several discontinuous ranges of columns like: a , c:d , f:h and j . 我想选择几个不连续的列范围,例如: ac:df:hj This can be done easily via dplyr's select() : 这可以通过dplyr的 select()轻松完成:

dt %>% select(a, c:d, f:h, j)

I am looking for a data.table way of achieving the same. 我正在寻找实现相同目的的data.table方法。

Right now, I can either select columns individually in any order: dt[ , .(a, c)] or giving just one sequence of column names on the form startcol:endcol : 现在,我可以按任何顺序分别选择列: dt[ , .(a, c)]或以startcol:endcol形式仅给出一个列名称序列:

dt[ , c:f]

However, I can't combine the above two methods to select several column ranges in one shot in .SDcols , like I did in dplyr::select 但是,我无法结合上述两种方法在.SDcols中的一个镜头中选择几个列范围,就像我在dplyr::select所做的dplyr::select

We can use the range part in .SDcols and then append the other column by concatenating 我们可以在.SDcols使用range部分,然后通过串联附加另一列

dt[, c(list(a= a), .SD) , .SDcols = c:d]

If there are multiple ranges, we create a sequence of ranges by match , and then get the corresponding column names 如果存在多个范围,则通过match创建一系列范围,然后获取对应的列名

i1 <- match(c("c", "f"), names(dt))
j1 <- match(c("d", "h"), names(dt))
nm1 <- c("a", names(dt)[unlist(Map(`:`, i1, j1))], "j")
dt[, ..nm1]
#   a  c  d  f  g  h  j
#1: 1 11 16 26 31 36 46
#2: 2 12 17 27 32 37 47
#3: 3 13 18 28 33 38 48
#4: 4 14 19 29 34 39 49
#5: 5 15 20 30 35 40 50

Also, the dplyr methods can be used within the data.table 此外, dplyr方法可以在内部使用data.table

dt[, select(.SD, a, c:d, f:h, j)]
#   a  c  d  f  g  h  j
#1: 1 11 16 26 31 36 46
#2: 2 12 17 27 32 37 47
#3: 3 13 18 28 33 38 48
#4: 4 14 19 29 34 39 49
#5: 5 15 20 30 35 40 50

Here is a workaround with cbind and two or more selections. 这是使用cbind和两个或多个选择的解决方法。

cbind(dt[, .(a)], dt[, c:d])
#    a  c  d
# 1: 1 11 16
# 2: 2 12 17
# 3: 3 13 18
# 4: 4 14 19
# 5: 5 15 20

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM