简体   繁体   English

在 %>% 内使用应用函数

[英]Use apply functions within %>%

Below I create a function that deletes a specific column if there is only one unique value in it.下面我创建了一个函数,如果其中只有一个唯一值,则删除特定列。 Can I somehow use lapply within %>% to avoid calling the function three times?我可以以某种方式在 %>% 内使用 lapply 以避免调用该函数三次吗? Or even call the function for all columns?或者甚至调用所有列的函数?

df <- tibble(col1 = sample(1:6), col2 = sample(1:6), col3 = 3, col4 = 4)

condDelCol <- function(mycolumn, mydataframe) {    
    if(length(unique(mydataframe[[mycolumn]])) == 1) { mydataframe[[mycolumn]] = NULL }
    mydataframe
}

df %>% 
condDelCol("col2", .) %>% 
condDelCol("col3", .) %>% 
condDelCol("col4", .)

With dplyr , an option is select_if使用dplyr ,一个选项是select_if

library(dplyr) 
df %>%
   select_if(~ n_distinct(.) > 1)
# A tibble: 6 x 2
#   col1  col2
#  <int> <int>
#1     1     6
#2     6     1
#3     5     5
#4     3     4
#5     4     2
#6     2     3

Or another way is base R by looping over the columns with sapply , create a logical vector , extract the column names that have only single unique value and assign ( <- ) it to NULL或者另一种方式是base R通过遍历与列sapply ,创建一个逻辑vector ,提取仅具有单一的列名unique值,并分配( <-NULL

i1 <-  sapply(df, function(x) length(unique(x)))
df[names(which(i1 == 1))] <- NULL

Or with Filter或带Filter

Filter(var, df)

You could use this one as well.你也可以用这个。 It ignores the columns for which the standard deviation is 0.它忽略标准差为 0 的列。

df[, sapply(df, sd) != 0]

# A tibble: 6 x 2
   col1  col2
  <int> <int>
1     1     3
2     5     6
3     6     1
4     2     2
5     3     4
6     4     5

or if you want to use the pipe operator或者如果你想使用管道操作符

df %>%
  select(which(sapply(df, sd) != 0))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM