I like the summary statistics of psych::describe but I want to replace the mean with the mode but only for factor variables. How do I program Mode's output to replace setosa (or any other factor variable) I use iris for replication even though it has only one.
getMode <- function(df) {
ux <- na.omit(unique(df))
ux[which.max(tabulate(match(df, ux)))]
}
Mode <- apply(iris%>% select(where(is.factor)), 2, getMode)
#I only want 5 of psych's descriptive stats plus the mode.
table <- cbind(psych::describe(iris),
Mode) [,c(3,4,8,9,2, 14)]
table
How can I get mean and mode to combine depending on the structure of the variable?
if_else
with where to tell R what to do when FALSE
? If I could get the mean to output when the variable is not a factor, I would get a column that combines means and modes. Psych
produces a dataframe where the identifying variable names are not selectable, so this makes any manual coding or listing the variables in mutate() impossible. They are also the majority of variables in my dataset (so manual or a mutate(case_when) would be REALLY tedious even if it could be done).
PS. I've tried changing my apply()
to map
functions but the output is not compatible with the cbind()
because it will list the other levels for each factor. If you have a better idea about that part of the code or think that's where I could combine getMode
and mean()
I don't mind suggestions.
If you're willing to use a different function to produce the same kind of output, you could use dplyr
and tidyr
to accomplish this. Using this approach you could do just what you want with ifelse()
to identify numeric or non-numeric variables. The only thing to note is that if you're having the function produce non-numeric values for for factors, the output for the numeric variables also has to be a character. That's why I wrapped the mean()
function in sprintf()
.
getMode <- function(df) {
ux <- na.omit(unique(df))
ux[which.max(tabulate(match(df, ux)))]
}
library(tidyr)
iris %>%
summarise_all(.funs = list(
mean = function(x)ifelse(is.numeric(x), sprintf("%.3f", mean(x)), as.character(getMode(x))),
sd = function(x)ifelse(is.numeric(x), sd(x), sd(as.numeric(x))),
min = function(x)ifelse(is.numeric(x), sprintf("%.3f", min(x)), levels(x)[1]),
max = function(x)ifelse(is.numeric(x), sprintf("%.3f", max(x)), levels(x)[length(levels(x))]),
n = function(x)sum(!is.na(x))
)) %>%
pivot_longer(everything(),
names_to = c("set", ".value"),
names_pattern = "(.+)_(.+)")
# A tibble: 5 x 6
# set mean sd min max n
# <chr> <chr> <dbl> <chr> <chr> <int>
# 1 Sepal.Length 5.843 0.828 4.300 7.900 150
# 2 Sepal.Width 3.057 0.436 2.000 4.400 150
# 3 Petal.Length 3.758 1.77 1.000 6.900 150
# 4 Petal.Width 1.199 0.762 0.100 2.500 150
# 5 Species setosa 0.819 setosa virginica 150
#
This also allows you to make other changes as well - for instance above, I replaced the minimum with the first level of Species
and the maximum with the last level of Species
. Not that this is necessarily what you'd want to do, but it's easy to change the values of the output based on the type of variable.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.