根据条件从 R 中的 data.table 中删除列

Question

如何根据 R 中 data.table 中的值删除列

如果我有一个 data.table 行，

dt = data.table("col1" = "a", "col2" = "b", "col3" = "c", 
"col4" = 'd', "col5" = "e", "col6" = 9, "col7" = 0, "col8" = 7,
"col9" = 0, "col10" = 99)

前 5 列是分类的，6-10 列是数字的。 数字列的所有行的数字都重复。

我有两个疑惑

如何删除包含 0 的列？ 此列可以根据输入而变化 - 即有时 col7 可能是 0 有时 col8 可能是 0 等等
删除具有 0 值的列后，如何将其余数字连接成一列 - 在这种情况下，新列将包含数字 9799

有没有办法在不删除 0 值列的情况下做到这一点？

对于第一部分，我尝试了

cols_chosen = c("col6", "col7","col8","col9","col10")

condition = c(FALSE, dt[, lapply(.SD, function(x) sum(x)< 1), .SDcols = cols_chosen])

dt[, which(condition) := NULL]

虽然我得到了条件的正确值（5 个逻辑值的列表），但最后一个命令失败并出现错误

Error in which(condition) : argument to 'which' is not logical

我从较早的答案中获取了上述陈述，根据 R 中的条件删除数据框列

Answer 1

dt = data.table("col1" = "a", "col2" = "b", "col3" = "c", 
"col4" = 'd', "col5" = "e", "col6" = 9, "col7" = 0, "col8" = 7,
"col9" = 0, "col10" = 99)

not0 = function(x) is.numeric(x) && !anyNA(x) && all(x!=0)
dt[, .(
    ## your categorical columns
    col1, col2, col3, col4, col5,
    ## new column pasted from non-0 numeric columns
    new = as.numeric(paste0(unlist(.SD), collapse=""))
  ),
  ## this filters columns to be provided in .SD column subset
  .SDcols = not0,
  ## we group by each row so it will handle input of multiple rows
  by = .(row=seq_len(nrow(dt)))
  ][, row:=NULL ## this removes extra grouping column
    ][] ## this prints
#   col1 col2 col3 col4 col5  new
#1:    a    b    c    d    e 9799

或者，如果您想就地更新现有表

is0 = function(x) is.numeric(x) && !anyNA(x) && all(x==0)
## remove columns that has 0
dt[, which(sapply(dt, is0)) := NULL]

## add new column
dt[, new := as.numeric(
    paste0(unlist(.SD), collapse="")
  ), .SDcols=is.numeric, by=.(row=seq_len(nrow(dt)))
  ][]
#   col1 col2 col3 col4 col5 col6 col8 col10  new
#1:    a    b    c    d    e    9    7    99 9799

Answer 2

要删除包含的列（假设数字如您所说的那样重复），就像检查第一行以查看是否有任何元素等于 0 并保留不等于 0 的列一样简单：

dt <- data.frame("col1" = "a", "col2" = "b", "col3" = "c", 
"col4" = 'd', "col5" = "e", "col6" = 9, "col7" = 0, "col8" = 7,
"col9" = 0, "col10" = 99)
dt <- dt[,dt[1,] != 0]

这给我们留下了 dt 为：

  col1 col2 col3 col4 col5 col6 col8 col10
1    a    b    c    d    e    9    7    99

要将剩余的数字列（假设它们都是整数）连接到一个新列中，您可以在第一行使用 lapply 来获得一个逻辑向量，该向量指示带有数字的列。 然后您可以将它们转换为字符串并将它们一起粘贴到新列中。

numTag <- unlist(lapply(X = dt[1,], FUN = is.numeric))
dt$new_col <- rep(as.numeric(paste(as.character(dt[1,numTag]), collapse = '', sep = '')), nrow(dt))

现在 dt 看起来像：

  col1 col2 col3 col4 col5 col6 col8 col10 new_col
1    a    b    c    d    e    9    7    99    9799

要在不删除零值列的情况下做到这一点，唯一必要的扭曲是从我们的初始逻辑向量中过滤掉零：

numTag <- unlist(lapply(X = dt[1,], FUN = is.numeric))
numTag <- numTag & (dt[1,] != 0)

dt$new_col <- rep(as.numeric(paste(as.character(dt[1,numTag]), collapse = '', sep = '')), nrow(dt))
dt

  col1 col2 col3 col4 col5 col6 col7 col8 col9 col10 new_col
1    a    b    c    d    e    9    0    7    0    99    9799

Answer 3

library(data.table)
library(dplyr)
library(tidyr)

dt = data.table("col1" = "a", "col2" = "b", "col3" = "c", 
                "col4" = 'd', "col5" = "e", "col6" = 9, "col7" = 0, "col8" = 7,
                "col9" = 0, "col10" = 99)


## which rows contain zeros?
zero_vars <-  dt %>% 
  dplyr::select_if(~max(.x) == 0) %>% 
  colnames()


## which row contains non-zero numeric vars?
numeric_vars <- dt %>% 
  dplyr::select(-all_of(zero_vars)) %>% 
  dplyr::select_if(is.numeric) %>% 
  colnames()
                  

## creat new table 
collapsed_dt <- 
  dt %>% 
  dplyr::select(all_of(numeric_vars)) %>%   ## select only non-zero numeric vars
  mutate_all(as.character) %>% 
  unite( col = "collapsed_var", sep = "") ## unite them to new var 'collapsed_var'


## re-join the collapsed var to the original table
dt %>% 
  dplyr::select_if(is.character) %>% ## only character variables
  cbind(collapsed_dt) ## bind the collapsed_dt

根据条件从 R 中的 data.table 中删除列

问题描述

3 个解决方案

解决方案1
2 已采纳 2020-11-23 16:01:38

解决方案2
1 2020-11-23 15:29:51

解决方案3
1 2020-11-23 16:06:23

根据条件从 R 中的 data.table 中删除列

问题描述

3 个解决方案

解决方案1 2 已采纳 2020-11-23 16:01:38

解决方案2 1 2020-11-23 15:29:51

解决方案3 1 2020-11-23 16:06:23

解决方案1
2 已采纳 2020-11-23 16:01:38

解决方案2
1 2020-11-23 15:29:51

解决方案3
1 2020-11-23 16:06:23