计算data.table中几列的z分数的平均值

Question

In R, I have a data table and a character vector with a subset of the data table's column names. 在R中，我有一个数据表和一个字符向量，其中包含该数据表的列名的子集。 I need to compute the z-scores (ie number of standard deviations from the mean) of each column with a specified name, and put the averages of the z-scores in a new column. 我需要计算具有指定名称的每列的z分数（即与平均值的标准偏差数），然后将z分数的平均值放入新列中。 I found a solution with explicit for-loops (posted below), but this must be a common enough task that some library function could be made to do the work more elegantly. 我找到了带有显式for循环的解决方案（在下面发布），但这必须是一项足够常见的任务，可以使某些库函数更优雅地完成工作。 Is there a better way? 有没有更好的办法？

Here's my solution: 这是我的解决方案：

#! /usr/bin/env RSCRIPT

library(data.table)

# Sample data table.
dt <- data.table(a=1:3, b=c(5, 6, 3), c=2:4)

# List of column names.
cols <- c('a', 'b')

# Convert columns to z-scores, and add each to a new list of vectors.
zscores <- list()
for (colIx in 1:length(cols)) {
  zscores[[colIx]] <- scale(dt[,get(cols[colIx])], center=TRUE, scale=TRUE)
}

# Average corresponding entries of each vector of z-scores.
avg <- numeric(nrow(dt))
for (rowIx in 1:nrow(dt)) {
  avg[rowIx] <- mean(sapply(1:length(cols),
                            function(colIx) {zscores[[colIx]][rowIx]}))
}

# Add new vector to the table, and print out the new table.
dt[,d:=avg]
print(dt)

This gives what you might expect. 这给出了您可能期望的结果。

   a b c           d
1: 1 5 2 -0.39089105
2: 2 6 3  0.43643578
3: 3 3 4 -0.04554473

Answer 1

scale can be applied to matrix(-like) object, you can get desired output by scale可以应用于类矩阵对象，您可以通过以下方式获得所需的输出

> set(dt, NULL, 'd', rowMeans(scale(dt[, cols, with = F])))
> dt
   a b c           d
1: 1 5 2 -0.39089105
2: 2 6 3  0.43643578
3: 3 3 4 -0.04554473

计算data.table中几列的z分数的平均值

问题描述

1 个解决方案

解决方案1
2 已采纳 2017-03-17 14:40:20

计算data.table中几列的z分数的平均值

问题描述

1 个解决方案

解决方案1 2 已采纳 2017-03-17 14:40:20

解决方案1
2 已采纳 2017-03-17 14:40:20