[英]Use apply functions within %>%
Below I create a function that deletes a specific column if there is only one unique value in it.下面我创建了一个函数,如果其中只有一个唯一值,则删除特定列。 Can I somehow use lapply within %>% to avoid calling the function three times?
我可以以某种方式在 %>% 内使用 lapply 以避免调用该函数三次吗? Or even call the function for all columns?
或者甚至调用所有列的函数?
df <- tibble(col1 = sample(1:6), col2 = sample(1:6), col3 = 3, col4 = 4)
condDelCol <- function(mycolumn, mydataframe) {
if(length(unique(mydataframe[[mycolumn]])) == 1) { mydataframe[[mycolumn]] = NULL }
mydataframe
}
df %>%
condDelCol("col2", .) %>%
condDelCol("col3", .) %>%
condDelCol("col4", .)
With dplyr
, an option is select_if
使用
dplyr
,一个选项是select_if
library(dplyr)
df %>%
select_if(~ n_distinct(.) > 1)
# A tibble: 6 x 2
# col1 col2
# <int> <int>
#1 1 6
#2 6 1
#3 5 5
#4 3 4
#5 4 2
#6 2 3
Or another way is base R
by looping over the columns with sapply
, create a logical vector
, extract the column names that have only single unique
value and assign ( <-
) it to NULL
或者另一种方式是
base R
通过遍历与列sapply
,创建一个逻辑vector
,提取仅具有单一的列名unique
值,并分配( <-
到NULL
i1 <- sapply(df, function(x) length(unique(x)))
df[names(which(i1 == 1))] <- NULL
Or with Filter
或带
Filter
Filter(var, df)
You could use this one as well.你也可以用这个。 It ignores the columns for which the standard deviation is 0.
它忽略标准差为 0 的列。
df[, sapply(df, sd) != 0]
# A tibble: 6 x 2
col1 col2
<int> <int>
1 1 3
2 5 6
3 6 1
4 2 2
5 3 4
6 4 5
or if you want to use the pipe operator或者如果你想使用管道操作符
df %>%
select(which(sapply(df, sd) != 0))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.