I'm trying to get the hang of tidyverse and dplyr approaches, and want to apply the following function to each column of a dataframe / datatable:
library(multimode)
funx <- function(x) {multimode::modetest(x, method = 'SI') }
and then try to use something like summarize_all to start with, but I immediately get an error:
Error: Column
mpg
must be length 1 (a summary value), not 8
library(dplyr)
mtcars %>%
summarise_all(funx)
what I hope to end up with is a new dataframe that shows the colnames tested in column 1, and the p-value of the modetest
in column 2
Since yesterday (23-05-2019) after updating packages my solution fails to work and the following code now prints '.' dots instead of column names submitted a post to the github page to ask about the cause of this change: github
library(multimode)
funx <- function(x) {
print(substitute(x))
multires <- multimode::modetest(x, method = 'SI')
p <- multires$p.value}
mtcars %>%
select(1:2) %>%
summarise_all(list(~ funx(.)))
UPDATE Ironically, after getting feedback on the github post, with the new version we can now do this:
mtcars %>%
select(1:2) %>%
summarise_all(funx)
Which, as you can see, is the exact same syntax as I started my problem with when posting this question. So, good work that the dplyr team has made the syntax more 'natural' I would say.
The summarise
can output only with a single element. According to ?summarise
Create one or more scalar variables summarizing the variables of an existing tbl. Tbls with groups created by group_by() will result in one row in the output for each group. Tbls with no groups will result in one row.
so if the output is more than length 1, wrap it in a list
and unnest
library(dplyr)
out <- mtcars %>%
summarise_all(list(~ list(funx(.))))
If we are extracting a single value eg p.value
, then no need to wrap it in a list
out1 <- mtcars %>%
select(1:2) %>%
summarise_all(list(~ funx(.)$p.value))
out1
# mpg cyl
#1 0.718 0.244
It can be converted to a two column dataset with gather
library(tidyr)
gather(out1, colName, pvalue) %>%
arrange(pvalue)
By checking the output of modetest
on a single column
funx(mtcars[[1]])
# Silverman (1981) critical bandwidth test
#data: x
#Critical bandwidth = 2.5413, p-value = 0.716
#alternative hypothesis: true number of modes is greater than 1
it is not a single value output, but a summary model output. So, it is better to store in a list
, however, we can extract specific components ( p-value
) and output it in summarise
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.