[英]R Keeping column name after removing all but one column
I have a list of dataframes and I want remove all columns that colsums are 4 and under.我有一个数据框列表,我想删除 colsums 为 4 及以下的所有列。 So I only want to keep columns that have colsums of 5 or greater.所以我只想保留列数为 5 或更大的列。 My testsample is down below.我的测试样本在下面。
I use this code for removing the columns.我使用此代码删除列。
NEWTEST = NULL
for (a in 1:length(TEST)) {
NEWTEST = colSums(TEST[[a]])
index = which(NEWTEST > 4)
TEST[[a]] = TEST[[a]][,index]}
#Change all in dataframes again
for (a in 1:length(TEST)) {
TEST[[a]] = as.data.frame(TEST[[a]])}
The problem is now, that when its only 1 column left like in df2 and df3, the column name disappears.现在的问题是,当它只剩下 1 列时,就像在 df2 和 df3 中一样,列名消失了。 But for me that column name is very important and I need to keep it (here I just chose Vn, but in reality its a descriptive column name and different in each dataframe.但对我来说,列名非常重要,我需要保留它(这里我只选择了 Vn,但实际上它是一个描述性的列名,并且在每个数据框中都不同。
Any idea, how I can simply keep that name?任何想法,我怎么能简单地保留这个名字?
TEST = structure(list(df1 = structure(list(V1 = c(15L, 18L, 18L, 12L,
14L, 16L, 10L, 14L, 29L, 16L, 20L, 20L, 13L, 3L, 14L), V2 = c(0L,
1L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 2L, 1L, 0L, 0L), V3 = c(0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), V4 = c(0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), V5 = c(0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), V6 = c(0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L)), .Names = c("V1",
"V2", "V3", "V4", "V5", "V6"), row.names = c(NA, 15L), class = "data.frame"),
df2 = structure(list(V1 = c(4L, 3L, 1L, 2L, 3L, 3L, 3L, 3L,
4L, 6L, 3L, 4L, 2L, 7L, 3L), V2 = c(0L, 0L, 0L, 0L, 0L, 0L,
1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), V3 = c(0L, 1L, 0L, 0L,
0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L), V4 = c(0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), V5 = c(0L,
0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L),
V6 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L)), .Names = c("V1", "V2", "V3", "V4", "V5",
"V6"), row.names = c(1L, 6L, 7L, 9L, 20L, 23L, 24L, 27L,
28L, 29L, 32L, 33L, 34L, 37L, 38L), class = "data.frame"),
df3 = structure(list(V1 = c(7L, 10L, 5L, 3L, 4L, 6L, 6L,
6L, 6L, 7L, 10L, 7L, 3L, 4L, 4L), V2 = c(0L, 0L, 0L, 0L,
0L, 1L, 2L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), V3 = c(0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, 0L), V4 = c(0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L),
V5 = c(0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L), V6 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L)), .Names = c("V1", "V2", "V3",
"V4", "V5", "V6"), row.names = c(1L, 19L, 20L, 21L, 22L,
23L, 24L, 25L, 26L, 27L, 28L, 29L, 30L, 38L, 39L), class = "data.frame")), .Names = c("df1",
"df2", "df3"))
We can use lapply
to loop over the list
, create the condition with colSums
and use that as column index.我们可以用lapply
遍历所有的list
中,创建条件colSums
并用其作为列索引。 Note that by default, if we use a index or column name without a comma, it is taken as column index/column name in data.frame
请注意,默认情况下,如果我们使用不带逗号的索引或列名,则在data.frame
其作为列索引/列名
lapply(TEST, function(x) x[colSums(x) >= 5])
Or with tidyverse
或者用tidyverse
library(purrr)
library(dplyr)
map(TEST, ~ .x %>%
select(where(~ sum(.) >= 5)))
The reason for the behavior experienced with OP is based on the drop = TRUE
in data.frame
when there is a single row/column ie it drops its dimensions to return vector. OP 所经历的行为的原因是基于data.frame
的drop = TRUE
当存在单行/列时,即它降低其维度以返回向量。 In this case, we could just subset with column index without a ,
or if we use ,
, then make sure to specify drop = FALSE
在这种情况下,我们可以只使用不带 a 的列索引进行子集化,
或者如果我们使用,
,则确保指定drop = FALSE
lapply(TEST, function(x) x[, colSums(x) >= 5, drop = FALSE])
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.