[英]Replacing mean from psych::describe for mode in dataframe
I like the summary statistics of psych::describe but I want to replace the mean with the mode but only for factor variables.我喜欢 psych::describe 的汇总统计数据,但我想用模式替换均值,但仅限于因子变量。 How do I program Mode's output to replace setosa (or any other factor variable) I use iris for replication even though it has only one.
我如何对 Mode 的 output 进行编程以替换 setosa(或任何其他因子变量) 我使用 iris 进行复制,即使它只有一个。
getMode <- function(df) {
ux <- na.omit(unique(df))
ux[which.max(tabulate(match(df, ux)))]
}
Mode <- apply(iris%>% select(where(is.factor)), 2, getMode)
#I only want 5 of psych's descriptive stats plus the mode.
table <- cbind(psych::describe(iris),
Mode) [,c(3,4,8,9,2, 14)]
table
How can I get mean and mode to combine depending on the structure of the variable?我怎样才能根据变量的结构来组合均值和众数?
if_else
with where to tell R what to do when FALSE
?if_else
与 where to tell R 当FALSE
时做什么? If I could get the mean to output when the variable is not a factor, I would get a column that combines means and modes. Psych
produces a dataframe where the identifying variable names are not selectable, so this makes any manual coding or listing the variables in mutate() impossible. Psych
生成 dataframe,其中标识变量名称不可选择,因此这使得任何手动编码或列出 mutate() 中的变量变得不可能。 They are also the majority of variables in my dataset (so manual or a mutate(case_when) would be REALLY tedious even if it could be done).它们也是我数据集中的大多数变量(因此即使可以完成,手动或 mutate(case_when) 也会非常乏味)。
PS.附言。 I've tried changing my
apply()
to map
functions but the output is not compatible with the cbind()
because it will list the other levels for each factor.我尝试将我的
apply()
更改为map
函数,但 output 与cbind()
不兼容,因为它会列出每个因素的其他级别。 If you have a better idea about that part of the code or think that's where I could combine getMode
and mean()
I don't mind suggestions.如果您对那部分代码有更好的了解,或者认为那是我可以组合
getMode
和mean()
的地方,我不介意建议。
If you're willing to use a different function to produce the same kind of output, you could use dplyr
and tidyr
to accomplish this.如果您愿意使用不同的 function 来生成相同类型的 output,则可以使用
dplyr
和tidyr
来完成此操作。 Using this approach you could do just what you want with ifelse()
to identify numeric or non-numeric variables.使用这种方法,您可以使用
ifelse()
做您想做的事情来识别数字或非数字变量。 The only thing to note is that if you're having the function produce non-numeric values for for factors, the output for the numeric variables also has to be a character.唯一需要注意的是,如果您让 function 为因子生成非数字值,则数字变量的 output 也必须是一个字符。 That's why I wrapped the
mean()
function in sprintf()
.这就是为什么我将
mean()
function 包装在sprintf()
中。
getMode <- function(df) {
ux <- na.omit(unique(df))
ux[which.max(tabulate(match(df, ux)))]
}
library(tidyr)
iris %>%
summarise_all(.funs = list(
mean = function(x)ifelse(is.numeric(x), sprintf("%.3f", mean(x)), as.character(getMode(x))),
sd = function(x)ifelse(is.numeric(x), sd(x), sd(as.numeric(x))),
min = function(x)ifelse(is.numeric(x), sprintf("%.3f", min(x)), levels(x)[1]),
max = function(x)ifelse(is.numeric(x), sprintf("%.3f", max(x)), levels(x)[length(levels(x))]),
n = function(x)sum(!is.na(x))
)) %>%
pivot_longer(everything(),
names_to = c("set", ".value"),
names_pattern = "(.+)_(.+)")
# A tibble: 5 x 6
# set mean sd min max n
# <chr> <chr> <dbl> <chr> <chr> <int>
# 1 Sepal.Length 5.843 0.828 4.300 7.900 150
# 2 Sepal.Width 3.057 0.436 2.000 4.400 150
# 3 Petal.Length 3.758 1.77 1.000 6.900 150
# 4 Petal.Width 1.199 0.762 0.100 2.500 150
# 5 Species setosa 0.819 setosa virginica 150
#
This also allows you to make other changes as well - for instance above, I replaced the minimum with the first level of Species
and the maximum with the last level of Species
.这也允许您进行其他更改 - 例如上面,我用第一级
Species
替换了最小值,用最后一级Species
替换了最大值。 Not that this is necessarily what you'd want to do, but it's easy to change the values of the output based on the type of variable.并不是说这一定是您想要做的,但是很容易根据变量的类型更改 output 的值。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.