简体   繁体   中英

Calculate median from x, y data R

I have a data frame on population of particles with given size. Data is organized in a dataframe where the first column represents the size (x value) and the other columns represent the density (y-values) for the actual size. I need to calculate the median for all the columns. Since median() works with hist data, I decided to transform my dataset to this type by adding Nth time the value of the first column to a vector and get N from all the columns for the rows. This actually works, but really slow with my 1200 lines dataframes, so I wonder if you have a more efficient solution.

df <- data.frame(Size = c(1:100),
                 val1 = sample(0:9,100,replace = TRUE,),
                 val2 = sample(0:9,100,replace = TRUE))

get.median <- function(dataset){
  results <- list()
  for(col in colnames(dataset)[2:ncol(dataset)]){
    col.results <- c()
    for(i in 1:nrow(dataset)){
      size <- dataset[i,"Size"]
      count <- dataset[i,col]
      out <- rep(size,count)
      col.results <- c(col.results,out)
    }
    med <- median(col.results)
    results <- append(results,med)
  }
  return(results)  
}

get.median(df)

Without transforming:

lapply(df[,2:3], function(y) median(rep(df$Size, times = y)))
$val1
[1] 49

$val2
[1] 47

data:

set.seed(99)
df <- data.frame(Size = c(1:100),
                 val1 = sample(0:9,100,replace = TRUE,),
                 val2 = sample(0:9,100,replace = TRUE))

You can use sapply and median to calculate the median for each column like this:

sapply(df, median)

Output:

Size val1 val2 
50.5  6.0  3.5

from "spatstat" library with dplyr::across

> df %>% summarize(across(-Size, ~weighted.median(Size,.x,na.rm = TRUE)))
  val1 val2
1 42.5 47.5

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM