简体   繁体   English

是否有用于减去一组变量的每个值的不同中位数的 R 函数?

[英]Is there an R function for subtracting different medians for each value of a group of a variable?

I've a data frame with the variables question_ID and estimate with 210 questions asked to 32 people (so 6720 obs.).我有一个带有变量 question_ID 的数据框,并估计向 32 人(所以 6720 人)提出了 210 个问题。 I want to calculate the log10 for each estimate and subtract the median of the logs for each question.我想计算每个估计的 log10 并减去每个问题的日志中位数。

Eg for question 1: Sum(log(Estimates1)-median1)/32, for question 2: Sum(log(Estimates2)-median2)/32 and so on till 210. So that at the end I hopefully have 210 values for each question.例如对于问题 1:Sum(log(Estimates1)-median1)/32,对于问题 2:Sum(log(Estimates2)-median2)/32 等等直到 210。所以最后我希望每个值都有 210 个值题。

So far I calculated the median for each question:到目前为止,我计算了每个问题的中位数:

m <- data %>% group_by(question_ID) %>% summarize(m=median(log10(estimate)))

I'm looking for an elegant solution where I don't need to come up with 210 subsets.我正在寻找一个优雅的解决方案,我不需要提出 210 个子集。 Any ideas?有任何想法吗?

Thanks in advance!提前致谢!

You can do this using base R functions.您可以使用基本 R 函数来执行此操作。 ave applies a function to a vector by subsets and returns a result the same length as the original vector. ave按子集将函数应用于向量,并返回与原始向量长度相同的结果。

# Calculate the medians within the dataframe using the ave function
data$logmedians <- ave( log(data$estimate,10) , data$question_ID, FUN=median)

# Now generate the difference between the log medians and the individual answers
data$diflogs <- log(data$estimate, 10) - data$logmedians

I think this is the simplest way to understand.我认为这是最简单的理解方式。 You can neaten things up using within and doing the entire calculation in the ave function:您可以neaten东西使用within ,做在整个计算ave功能:

data <- within(data,{
   diflogs <- ave(estimate, question_ID, FUN=function(x) log(x,10) - median(log(x,10))
   })

Note the median of logs isn't exactly the same as the log of the medians if there is an even number of responses.请注意,如果响应数为偶数,则对数的中位数与中位数的对数并不完全相同。 Be careful about exactly which you want.小心你想要的。

You can first calculate log of the estimates and for each question subtract it from median value, sum them and divide by 32.你可以先计算出log中的estimates和对每个问题从它减去median值, sum了32他们和鸿沟。

library(dplyr)

data %>% 
 mutate(log_m = log10(estimate)) %>% 
 group_by(question_ID) %>% 
 summarize(m = sum(log_m - median(log_m))/32)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM