简体   繁体   中英

How to apply a custom function to each column of my dataframe

I'm trying to get the hang of tidyverse and dplyr approaches, and want to apply the following function to each column of a dataframe / datatable:

library(multimode)
funx <- function(x) {multimode::modetest(x, method = 'SI') }

and then try to use something like summarize_all to start with, but I immediately get an error:

Error: Column mpg must be length 1 (a summary value), not 8

library(dplyr)

mtcars %>%
     summarise_all(funx)

what I hope to end up with is a new dataframe that shows the colnames tested in column 1, and the p-value of the modetest in column 2

Since yesterday (23-05-2019) after updating packages my solution fails to work and the following code now prints '.' dots instead of column names submitted a post to the github page to ask about the cause of this change: github

library(multimode)
funx <- function(x) {
    print(substitute(x))
    multires <- multimode::modetest(x, method = 'SI') 
    p <- multires$p.value}

mtcars %>% 
    select(1:2) %>%
    summarise_all(list(~ funx(.)))

UPDATE Ironically, after getting feedback on the github post, with the new version we can now do this:

   mtcars %>%
      select(1:2) %>%
        summarise_all(funx)

Which, as you can see, is the exact same syntax as I started my problem with when posting this question. So, good work that the dplyr team has made the syntax more 'natural' I would say.

The summarise can output only with a single element. According to ?summarise

Create one or more scalar variables summarizing the variables of an existing tbl. Tbls with groups created by group_by() will result in one row in the output for each group. Tbls with no groups will result in one row.

so if the output is more than length 1, wrap it in a list and unnest

library(dplyr)    
out <- mtcars %>%
          summarise_all(list(~ list(funx(.))))

If we are extracting a single value eg p.value , then no need to wrap it in a list

out1 <- mtcars %>% 
          select(1:2) %>%
          summarise_all(list(~ funx(.)$p.value))
out1
#    mpg   cyl
#1 0.718 0.244

It can be converted to a two column dataset with gather

library(tidyr)
gather(out1, colName, pvalue) %>%
      arrange(pvalue)

By checking the output of modetest on a single column

funx(mtcars[[1]])

#   Silverman (1981) critical bandwidth test

#data:  x
#Critical bandwidth = 2.5413, p-value = 0.716
#alternative hypothesis: true number of modes is greater than 1

it is not a single value output, but a summary model output. So, it is better to store in a list , however, we can extract specific components ( p-value ) and output it in summarise

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM