将函数应用于满足data.table中列子集条件的子集元素

Question

Consider the data.table, exampleDT , 考虑data.table， exampleDT ，

set.seed(7)
exampleDT = data.table(colA = rnorm(10,15,5),
                       colB = runif(10,100,150),
                       targetA = rnorm(10,12,2),
                       targetB = rnorm(10,8,4))

If I want to calculate the mean of all elements in column targetA , for example, that are below some threshold -- say, 10 -- I can do the following: 例如，如果我要计算targetA列中所有元素的均值，这些均值低于某个阈值（例如10），则可以执行以下操作：

examp_threshold = 10
exampleDT[targetA<examp_threshold,mean(targetA)]
# [1] 9.224007566814299

And if I want to calculate the mean of all elements in columns targetA and targetB , for example, I can do the following: 而且，例如，如果我要计算targetA和targetB列中所有元素的均值，则可以执行以下操作：

target_cols = names(exampleDT)[which(names(exampleDT) %like% "target")] 
exampleDT[,lapply(.SD,mean),.SDcols=target_cols]
#              targetA           targetB
# 1: 12.60101574551183 7.585007905896557

But I don't know how to combine the two; 但是我不知道如何将两者结合起来。 that is, to calculate the mean of all elements in all columns containing a specified string ("target", in this case) that are below some specified threshold (10, here). 也就是说，计算包含指定字符串（在此情况下为“目标”）的所有列中的所有元素的均值低于某个指定阈值（此处为10）的平均值。 This was my first guess, but it was unsuccessful: 这是我的第一个猜测，但未成功：

exampleDT[.SD<examp_threshold,lapply(.SD,mean),.SDcols=target_cols]
#Empty data.table (0 rows) of 2 cols: targetA,targetB

Answer 1

You need to subset in the j expression, like so: 您需要在j表达式中设置子集，如下所示：

exampleDT[, lapply(.SD, function(x) mean(x[x<examp_threshold])),.SDcols=target_cols]

#   targetA targetB
#1: 9.224008 6.66624

将函数应用于满足data.table中列子集条件的子集元素

问题描述

1 个解决方案

解决方案1
2 已采纳 2017-07-20 13:59:14

将函数应用于满足data.table中列子集条件的子集元素

问题描述

1 个解决方案

解决方案1 2 已采纳 2017-07-20 13:59:14

解决方案1
2 已采纳 2017-07-20 13:59:14