Subset tibble based on column sums, while retaining character columns

Question

I have a feeling this is a pretty stupid issue, but I haven't been able to find the solution either

I have a tibble where each row is a sample and the first column is a character variable containing the sample ID and all subsequent columns are variables with numeric variables.

For example:

id <- c("a", "b", "c", "d", "e")
x1 <- rep(1,5)
x2 <- seq(1,5,1)
x3 <- rep(2,5)    
x4 <- seq(0.1, 0.5, 0.1)
tb <- tibble(id, x1, x2, x3, x4)

I want to subset this to include only the columns with a sum greater than 5, and the id column. With the old dataframe structure, I know the following worked:

df <- as.data.frame(tb)
df2 <- cbind(df$id, df[,colSums(df[,2:5])>5)
colnames(df2)[1] <- "id"

However, when I try to subset this way with a tibble, I get the error message:

Error: Length of logical index vector must be 1 or 5, got: 4

Does anyone know how to accomplish this task without converting to the old data frame format? Preferably without creating an intermediate tibble with the id variable missing, because separating my ids from my data is just asking for trouble down the road.

Thanks!

Answer 1

# install.packages(c("tidyverse"), dependencies = TRUE)
library(tibble)
df <- tibble(id = letters[1:5], x1 = 1, x2 = 1:5, x3 = 2, x4 = seq(.1, .5, len = 5))
### two additional examples of how to generate the Tibble data
### exploiting that its arguments are evaluated lazily and sequentially
# df <- tibble(id = letters[1:5], x1 = 1, x2 = 1:5, x3 = x1 + 1, x4 = x2/10)
# df <- tibble(x2 = 1:5, id = letters[x2], x3 = 2, x1 = x3-1, x4 = x2/10) %>%
#              select(id, num_range("x", 1:4))

base R solution, cf. HubertL's comment above ,

###  HubertL's base solution
df[c(TRUE,colSums(df[2:5])>5)]
#> # A tibble: 5 x 3
#>      id    x2    x3
#>   <chr> <int> <dbl>
#> 1     a     1     2
#> 2     b     2     2
#> 3     c     3     2
#> 4     d     4     2
#> 5     e     5     2

dplyr solution, cf David Klotz's comment ,

### Klotz's dplyr solution
library(dplyr)
df %>% select_if(function(x) is.character(x) || sum(x) > 5)
#> # A tibble: 5 x 3
#>      id    x2    x3
#>   <chr> <int> <dbl>
#> 1     a     1     2
#> 2     b     2     2
#> 3     c     3     2
#> 4     d     4     2
#> 5     e     5     2

Subset tibble based on column sums, while retaining character columns

Question

1 answers

solution1
0 2017-10-18 00:00:33

Subset tibble based on column sums, while retaining character columns

Question

1 answers

solution1 0 2017-10-18 00:00:33

solution1
0 2017-10-18 00:00:33