简体   繁体   English

基于列总和的子集小标题,同时保留字符列

[英]Subset tibble based on column sums, while retaining character columns

I have a feeling this is a pretty stupid issue, but I haven't been able to find the solution either 我觉得这是一个非常愚蠢的问题,但我也无法找到解决方案

I have a tibble where each row is a sample and the first column is a character variable containing the sample ID and all subsequent columns are variables with numeric variables. 我有一个小标题,其中每一行都是一个样本,第一列是一个包含样本ID的字符变量,所有后续列都是带有数字变量的变量。

For example: 例如:

id <- c("a", "b", "c", "d", "e")
x1 <- rep(1,5)
x2 <- seq(1,5,1)
x3 <- rep(2,5)    
x4 <- seq(0.1, 0.5, 0.1)
tb <- tibble(id, x1, x2, x3, x4) 

I want to subset this to include only the columns with a sum greater than 5, and the id column. 我想对此进行子集化,以仅包括总和大于5的列和id列。 With the old dataframe structure, I know the following worked: 使用旧的数据框结构,我知道以下工作方式:

df <- as.data.frame(tb)
df2 <- cbind(df$id, df[,colSums(df[,2:5])>5)
colnames(df2)[1] <- "id"

However, when I try to subset this way with a tibble, I get the error message: 但是,当我尝试以这种方式对子集进行细化时,出现错误消息:

Error: Length of logical index vector must be 1 or 5, got: 4

Does anyone know how to accomplish this task without converting to the old data frame format? 有谁知道如何在不转换为旧数据帧格式的情况下完成此任务? Preferably without creating an intermediate tibble with the id variable missing, because separating my ids from my data is just asking for trouble down the road. 最好不要创建缺少id变量的中间小标题,因为将我的id与数据分开只是在路上麻烦。

Thanks! 谢谢!

# install.packages(c("tidyverse"), dependencies = TRUE)
library(tibble)
df <- tibble(id = letters[1:5], x1 = 1, x2 = 1:5, x3 = 2, x4 = seq(.1, .5, len = 5))
### two additional examples of how to generate the Tibble data
### exploiting that its arguments are evaluated lazily and sequentially
# df <- tibble(id = letters[1:5], x1 = 1, x2 = 1:5, x3 = x1 + 1, x4 = x2/10)
# df <- tibble(x2 = 1:5, id = letters[x2], x3 = 2, x1 = x3-1, x4 = x2/10) %>%
#              select(id, num_range("x", 1:4))

base R solution, cf. base R解决方案,请参阅。 HubertL's comment above , HubertL的上述评论

###  HubertL's base solution
df[c(TRUE,colSums(df[2:5])>5)]
#> # A tibble: 5 x 3
#>      id    x2    x3
#>   <chr> <int> <dbl>
#> 1     a     1     2
#> 2     b     2     2
#> 3     c     3     2
#> 4     d     4     2
#> 5     e     5     2

dplyr solution, cf David Klotz's comment , dplyr解决方案,请dplyr David Klotz的评论

### Klotz's dplyr solution
library(dplyr)
df %>% select_if(function(x) is.character(x) || sum(x) > 5)
#> # A tibble: 5 x 3
#>      id    x2    x3
#>   <chr> <int> <dbl>
#> 1     a     1     2
#> 2     b     2     2
#> 3     c     3     2
#> 4     d     4     2
#> 5     e     5     2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM