简体   繁体   English

使用 data.table 计算 lapply 中的新列

[英]Calculate new columns in lapply with a data.table

Apologies in advance if this is double posting but I'm not having luck finding a solution to what I'm trying to make work here (and learn).如果这是双重发布,请提前道歉,但我没有运气找到解决我在这里工作(和学习)的方法。

I'm trying to change my code to data.table approaches rather than data.frame because of the speed advantages as I'm dealing with hundreds of measurement files with each millions of values.由于速度优势,我正在尝试将我的代码更改为 data.table 方法而不是 data.frame,因为我正在处理具有数百万个值的数百个测量文件。

I have trouble figuring out how to code the following scenario: My columns have names consisting of 2 parts: Channel and parameter like: FWS.Maximum, FWS.Minimum我无法弄清楚如何对以下场景进行编码:我的列的名称由 2 部分组成:通道和参数,如:FWS.Maximum、FWS.Minimum

since the code has to work for instrument data with differing channels, I write it so that R automatically finds the Channel part and then loop through them with lapply .由于代码必须适用于具有不同通道的仪器数据,我编写它以便 R 自动找到通道部分,然后使用lapply循环它们。 What I am trying to do here is calculate Range as Channel.Maximum column - channel.minimum column.我在这里要做的是将范围计算为 Channel.Maximum 列 - channel.minimum 列。

df[, FWS.Range := (FWS.Maximum - FWS.Minimum)]

works fine, but in the loop it would look like this:工作正常,但在循环中它看起来像这样:

x <- "FWS"

mydf[ , paste(x, "Range", sep = '.') := paste(x, "Maximum", sep = '.') - paste(x, "Minimum", sep = '.')]

but that throws the following error:但这会引发以下错误:

Error in paste(x, "Maximum", sep = ".") - paste(x, "Minimum", sep = ".") : paste(x, "Maximum", sep = ".") - paste(x, "Minimum", sep = ".") 出错:
non-numeric argument to binary operator二元运算符的非数字参数

Dummy data with only 5 columns to test it on ( real data has dozens that I need to adjust along this style )只有 5 列的虚拟数据可供测试(真实数据有几十个,我需要按照这种风格进行调整)

mydf = data.table(ID = c(1,2,3,4,5), FWS.Maximum = c(12, 17,29, 22), FWS.Minimum = c(5,4,1,6),
FL.Red.Maximum = c(12, 17,29, 22), FL.Red.Minimum = c(5,4,1,6))

The code i'm trying to get this to work for is this:我试图让它工作的代码是这样的:

lapply(substr(names(mydf)[grepl("Maximum", names(mydf))], 1, nchar(names(mydf)[grepl("Maximum", names(mydf))])-8), function(x) { 
  mydf[ paste(x, "Range", sep = '.'):= paste(x, "Maximum", sep = '.') - paste(x, "Minimum", sep = '.')]  })

which currently tells me目前告诉我

Error in := (paste(x, "Range", sep = "."), paste(x, "Maximum", sep = ".") - : Check that is.data.table(DT) == TRUE. Otherwise, := and := (...) are defined for use in j, once only and in particular ways. See help(":=").错误在:= (paste(x, "Range", sep = "."), paste(x, "Maximum", sep = ".") - : 检查 is.data.table(DT) == TRUE。否则, := 和:= (...) 被定义为在 j 中使用,仅一次且以特定方式使用。请参阅 help(":=")。

Thanks to the answers of MichaelChirrico and Jaap, and my own trying to stop the printing on the console:感谢 MichaelChirillo 和 Jaap 的回答,以及我自己试图停止控制台上的打印:

 invisible(lapply(list.of.channels,  function(x) {
mydf[ , paste(x, "Range", sep = '.') := get(paste(x, "Maximum", sep = '.')) - get(paste(x, "Minimum", sep = '.'))]}))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM