This may seem like a duplicate question but maybe I am missing something here.
I have been trying to remove just the columns where the sum of absolute values add to zero from a data.table.
I searched and found many solutions on this site that claim to work, and in fact, when I copy/paste exact code, it does work. However, for some reason, I can not duplicate it with my data.table.
The result of almost anything I do turns my data.table into a list. I even tried to convert my data.table to data.frame to try these solutions with no luck.
from here :
SelectVar[, colSums(SelectVar != 0) > 0]
Does not work.
SelectVar[, !apply(SelectVar == 0, 2, all)]
Does not work either.
remove_zero_cols <- function(df) {
rem_vec <- NULL
for(i in 1:ncol(df)){
this_sum <- summary(df[,i])
zero_test <- length(which(this_sum == 0))
if(zero_test == 6) {
rem_vec[i] <- names(df)[i]
}
}
features_to_remove <- rem_vec[!is.na(rem_vec)]
rem_ind <- which(names(df) %in% features_to_remove)
df <- df[,-rem_ind]
return(df)
}
This function also does not work.
I checked the class of each parameter and they are all either numeric or integer types. I also checked for any NA's and found none.
Any suggestions?
Add with = FALSE
to the first solution you referenced to if you are working on a data.table.
# Create example data frame
SelectVar <- read.table(text = " a b c d e f g h i j k l ll m n o p q r
1 Dxa8 Dxa8 0 Dxa8 Dxa8 0 Dxa8 Dxa8 0 0 0 0 0 0 0 0 0 Dxc8 0
2 Dxb8 Dxc8 0 Dxe8 Dxi8 0 tneg tpos 0 0 0 0 0 0 0 0 0 Dxi8 0",
header = TRUE, stringsAsFactors = FALSE)
# Convert to a data.table
library(data.table)
setDT(SelectVar)
SelectVar[, colSums(SelectVar != 0) > 0, with = FALSE]
# a b d e g h q
# 1: Dxa8 Dxa8 Dxa8 Dxa8 Dxa8 Dxa8 Dxc8
# 2: Dxb8 Dxc8 Dxe8 Dxi8 tneg tpos Dxi8
Here's a tidyverse solution. You could convert your data.table into a tibble and then go from there.
library(tidyverse)
df <- tibble(a = 1:5, b = -1:3, c = 0)
selection_criteria <- (colSums(abs(df)) != 0)
df[selection_criteria]
The OP has requested to remove just the columns where the sum of absolute values add to zero . Later on, he has clarified that he wants to drop data.table columns that ONLY contain 0's all the way down each row .
This can be achieved by using the any()
function
library(data.table)
#create sample data
n_rows <- 10L
n_cols <- 5L
DT <- data.table(id = 1:n)
dat_cols <- sprintf("dat%i", seq.int(n_cols))
for (j in seq.int(n_cols)) set(DT, NULL, dat_cols[j], 0L)
set.seed(1L)
DT[sample.int(n_rows, 0.1 * n_rows), (sample.int(n_cols, 0.5 * n_cols)) := 1L]
DT
id dat1 dat2 dat3 dat4 dat5 1: 1 0 0 0 0 0 2: 2 0 1 1 0 0 3: 3 0 0 0 0 0 4: 4 0 0 0 0 0 5: 5 0 0 0 0 0 6: 6 0 0 0 0 0 7: 7 0 0 0 0 0 8: 8 0 0 0 0 0 9: 9 0 0 0 0 0 10: 10 0 0 0 0 0
# find columns which are all zero using any()
dat_cols <- sprintf("dat%i", seq.int(n_cols))
zero_cols <- setDT(DT)[, lapply(.SD, function(x) !any(x)),
.SDcols = dat_cols]
# remove columns in place
DT[, (names(which(unlist(zero_cols)))) := NULL][]
id dat2 dat3 1: 1 0 0 2: 2 1 1 3: 3 0 0 4: 4 0 0 5: 5 0 0 6: 6 0 0 7: 7 0 0 8: 8 0 0 9: 9 0 0 10: 10 0 0
Before:
library(tidy verse)
DT = as_tibble(list(x=c(1,0), y=c(0,0)))
DT
A tibble: 2 x 2
x y
<dbl> <dbl>
1 1 0
2 0 0
Use:
DT1 = DT %>% select_if(any)
DT1
After:
tibble: 2 x 1
x
<dbl>
1 1
2 0
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.