繁体   English   中英

计算所有数值列的绝对值的平均值

[英]Calculating the mean of the absolute value of all numerical columns

我想为示例数据集DT计算所有数字列的绝对值的平均值:

library(data.table)
set.seed(1)
DT <- data.table(panelID = sample(50,50),                                                    # Creates a panel ID
                      Country = c(rep("Albania",30),rep("Belarus",50), rep("Chilipepper",20)),       
                      some_NA = sample(0:5, 6),                                             
                      some_NA_factor = sample(0:5, 6),         
                      Group = c(rep(1,20),rep(2,20),rep(3,20),rep(4,20),rep(5,20)),
                      Time = rep(seq(as.Date("2010-01-03"), length=20, by="1 month") - 1,5),
                      norm = round(runif(100)/10,2),
                      Income = round(rnorm(10,-5,5),2),
                      Happiness = sample(10,10),
                      Sex = round(rnorm(10,0.75,0.3),2),
                      Age = sample(100,100),
                      Educ = round(rnorm(10,0.75,0.3),2))           
DT [, uniqueID := .I]                                                                        # Creates a unique ID     
DT[DT == 0] <- NA                                                                            # https://stackoverflow.com/questions/11036989/replace-all-0-values-to-na
DT$some_NA_factor <- factor(DT$some_NA_factor)

我尝试按以下方式计算均值和绝对均值:

mean_of_differences <- DT[,lapply(Filter(is.numeric,.SD),mean, na.rm=TRUE)]
mean_of_differences <- as.data.frame(t(mean_of_differences))
mean_of_differences <- round(mean_of_differences, digits=2)
mean_of_absolute_diff <- DT[,lapply(Filter(is.numeric,.SD),function(x) mean(abs(x),na.rm=TRUE))]
mean_of_absolute_diff <- as.data.frame(t(mean_of_absolute_diff))
mean_of_absolute_diff <- round(mean_of_differences, digits=2)

但是,绝对差额的收入平均值为负(与正常平均值相同),这显然是不可能的。 如果看我的代码,我不明白我在做什么错。 我在俯视什么?

这是使用data.table的解决方案。 它(i)标识数字列,并且(ii)获得每个数字列的绝对值的平均值。

数据

dt = data.table(
num1 = rnorm(100),
num2 = rnorm(100),
strv = sample(LETTERS, 100, replace = T)
)

numcols = colnames(dt)[unlist(lapply(dt, is.numeric))] # Which columns are numeric?

# > numcols
# [1] "num1" "num2"

meandt = dt[, lapply(.SD, function(x) mean(abs(x))), .SDcols = numcols]
newcols = paste('mean_abs_', numcols, sep = ''); colnames(meandt) = newcols

# > meandt
#        mean_abs_num1 mean_abs_num2
# 1:     0.8287523     0.8325123

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM