Situation
I am currently using ddply
and numcolwise(summary)
to return 5-summary stats (ie min
, Q1
, Q2
, mean
, Q3
and max
) for a given data frame.
However I can't figure out how to handle NA
s (having tried various combinations of rm.na=TRUE
.
Here is an example data frame and how I am using ddply
and numcolwise(summary)
.
library(dplyr)
id <- c(1, 2, 3, 4, 5)
name <- c("name1", "name2", "name3", "name4", "name5")
position <- c("AAA", "BBB", "CCC", "AAA", "BBB")
salary <- c(20, 30, 40, 50, 60)
bonus <- c(1, 1, 1, NA, 1)
sti <- c(2, 3, 4, 5, 6)
lti <- c(6, 5, 4, 3, 2)
other <- c(10, 11, 12, 13, 14)
df <- data.frame(id, name, position, salary, bonus, sti, lti, other)
df_out <- ddply(df, .(position), numcolwise(summary))
Question
Is it possible to use numcolwise(summary)
this way that can handle NA
s, or is there a method / function that will give me the 5-stats for each numerical column that can?
Notes
These functions all work
min(df[,"bonus"], na.rm=TRUE)
median(df[,"bonus"], na.rm=TRUE)
mean(df[,"bonus"], na.rm=TRUE)
quantile(df[,"bonus"], probs=(c(0.25, 0.5, 0.75)), type=7, na.rm=TRUE)
summary(df[,"bonus"], na.rm=TRUE)
Update
After some research one possible, but not very elegant solution is
df[,c("position", "salary","bonus","sti","lti","other")] %>%
group_by(position) %>%
summarise_each(funs(min, quantile(.,0.25, na.rm=TRUE),
quantile(.,0.5, na.rm=TRUE), mean, quantile(., 0.75, na.rm=TRUE), max))
I can achieve the result using the %>%
notation, summarise_each()
and specifying the functions in the funs
arguments.
df[,c("position", "salary","bonus","sti","lti","other")] %>%
group_by(position) %>%
summarise_each(funs(min, quantile(.,0.25, na.rm=TRUE),
quantile(.,0.5, na.rm=TRUE), mean, quantile(., 0.75, na.rm=TRUE), max))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.