简体   繁体   中英

R function to calculate mean/median of top highest values

I have a data frame with 2 columns one with numeric values and one with a name. The name repeats itself but has different values each time.

Data <- data.frame(
Value = c(1:10),
Name = rep(LETTERS, each=4)[1:10])

I would like to write a function that takes the 3 highest numbers for each name and calculates mean and median (and in case there aren't 3 values present throw an NA) and then take all the values for each name and calculate mean and median. My initial attempt looks something like this:

my.mean <- function (x,y){
  top3.x  <- ifelse(x > 3 , NA, x)
  return(mean(top3.x), median(top3.x))
}

Any hints on how to improve this will be appreciated.

I would probably recommend by for this.

Something put together really quickly might look like this (if I understood your question correctly):

myFun <- function(indf) {
  do.call(rbind, with(indf, by(Value, Name, FUN=function(x) {
    Vals <- head(sort(x, decreasing=TRUE), 3)
    if (length(Vals) < 3) {
      c(Mean = NA, Median = NA)
    } else {
      c(Mean = mean(Vals), Median = median(Vals))
    }
  })))
}
myFun(Data)
#   Mean Median
# A    3      3
# B    7      7
# C   NA     NA

Note that it is not a very useful function in this form because of how many parameters are hard-coded into the function. It's really only useful if your data is in the form you shared.

Here's a data.table solution, assuming that you don't have any other NAs in your data:

require(data.table)  ## 1.9.2+
setDT(Data)          ## convert to data.table
Data[order(Name, -Value)][, list(m1=mean(Value[1:3]), m2=median(Value[1:3])), by=Name]

#    Name m1 m2
# 1:    A  3  3
# 2:    B  7  7
# 3:    C NA NA

Using dplyr

 library(dplyr)
 myFun1 <- function(dat){
 dat %>%
 group_by(Name)%>%
 arrange(desc(Value))%>%
 mutate(n=n(), Value=ifelse(n<=3, NA_integer_, Value))%>%
 summarize(Mean=mean(head(Value,3)), Median=median(head(Value,3)))
 }

  myFun1(Data)
 #Source: local data frame [3 x 3]

 # Name Mean Median
 #1    A    3      3
 #2    B    7      7
 #3    C   NA     NA

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM