[英]Removing columns from a data.table in R based on conditions
如何根据 R 中 data.table 中的值删除列
如果我有一个 data.table 行,
dt = data.table("col1" = "a", "col2" = "b", "col3" = "c",
"col4" = 'd', "col5" = "e", "col6" = 9, "col7" = 0, "col8" = 7,
"col9" = 0, "col10" = 99)
前 5 列是分类的,6-10 列是数字的。 数字列的所有行的数字都重复。
我有两个疑惑
如何删除包含 0 的列? 此列可以根据输入而变化 - 即有时 col7 可能是 0 有时 col8 可能是 0 等等
删除具有 0 值的列后,如何将其余数字连接成一列 - 在这种情况下,新列将包含数字 9799
有没有办法在不删除 0 值列的情况下做到这一点?
对于第一部分,我尝试了
cols_chosen = c("col6", "col7","col8","col9","col10")
condition = c(FALSE, dt[, lapply(.SD, function(x) sum(x)< 1), .SDcols = cols_chosen])
dt[, which(condition) := NULL]
虽然我得到了条件的正确值(5 个逻辑值的列表),但最后一个命令失败并出现错误
Error in which(condition) : argument to 'which' is not logical
我从较早的答案中获取了上述陈述, 根据 R 中的条件删除数据框列
dt = data.table("col1" = "a", "col2" = "b", "col3" = "c",
"col4" = 'd', "col5" = "e", "col6" = 9, "col7" = 0, "col8" = 7,
"col9" = 0, "col10" = 99)
not0 = function(x) is.numeric(x) && !anyNA(x) && all(x!=0)
dt[, .(
## your categorical columns
col1, col2, col3, col4, col5,
## new column pasted from non-0 numeric columns
new = as.numeric(paste0(unlist(.SD), collapse=""))
),
## this filters columns to be provided in .SD column subset
.SDcols = not0,
## we group by each row so it will handle input of multiple rows
by = .(row=seq_len(nrow(dt)))
][, row:=NULL ## this removes extra grouping column
][] ## this prints
# col1 col2 col3 col4 col5 new
#1: a b c d e 9799
或者,如果您想就地更新现有表
is0 = function(x) is.numeric(x) && !anyNA(x) && all(x==0)
## remove columns that has 0
dt[, which(sapply(dt, is0)) := NULL]
## add new column
dt[, new := as.numeric(
paste0(unlist(.SD), collapse="")
), .SDcols=is.numeric, by=.(row=seq_len(nrow(dt)))
][]
# col1 col2 col3 col4 col5 col6 col8 col10 new
#1: a b c d e 9 7 99 9799
dt <- data.frame("col1" = "a", "col2" = "b", "col3" = "c",
"col4" = 'd', "col5" = "e", "col6" = 9, "col7" = 0, "col8" = 7,
"col9" = 0, "col10" = 99)
dt <- dt[,dt[1,] != 0]
这给我们留下了 dt 为:
col1 col2 col3 col4 col5 col6 col8 col10
1 a b c d e 9 7 99
numTag <- unlist(lapply(X = dt[1,], FUN = is.numeric))
dt$new_col <- rep(as.numeric(paste(as.character(dt[1,numTag]), collapse = '', sep = '')), nrow(dt))
现在 dt 看起来像:
col1 col2 col3 col4 col5 col6 col8 col10 new_col
1 a b c d e 9 7 99 9799
numTag <- unlist(lapply(X = dt[1,], FUN = is.numeric))
numTag <- numTag & (dt[1,] != 0)
dt$new_col <- rep(as.numeric(paste(as.character(dt[1,numTag]), collapse = '', sep = '')), nrow(dt))
dt
col1 col2 col3 col4 col5 col6 col7 col8 col9 col10 new_col
1 a b c d e 9 0 7 0 99 9799
library(data.table)
library(dplyr)
library(tidyr)
dt = data.table("col1" = "a", "col2" = "b", "col3" = "c",
"col4" = 'd', "col5" = "e", "col6" = 9, "col7" = 0, "col8" = 7,
"col9" = 0, "col10" = 99)
## which rows contain zeros?
zero_vars <- dt %>%
dplyr::select_if(~max(.x) == 0) %>%
colnames()
## which row contains non-zero numeric vars?
numeric_vars <- dt %>%
dplyr::select(-all_of(zero_vars)) %>%
dplyr::select_if(is.numeric) %>%
colnames()
## creat new table
collapsed_dt <-
dt %>%
dplyr::select(all_of(numeric_vars)) %>% ## select only non-zero numeric vars
mutate_all(as.character) %>%
unite( col = "collapsed_var", sep = "") ## unite them to new var 'collapsed_var'
## re-join the collapsed var to the original table
dt %>%
dplyr::select_if(is.character) %>% ## only character variables
cbind(collapsed_dt) ## bind the collapsed_dt
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.