[英]Applying function to a subset of columns depending on a conditional
I have a large data.table
with hundreds of columns and thousands of rows. 我有一个大型的
data.table
其中包含数百列和数千行。 Most of the columns hold numeric values that are ratios like X/Y or Y/Z etc. 大多数列都包含数值,例如X / Y或Y / Z等比率。
I need to flip some of these ratios so that they are transformed from Y/Z -> Z/Y. 我需要翻转一些比率,以便从Y / Z-> Z / Y转换它们。 The only indicator I have of these columns is the column name which includes the substring "x/y"or "y/z".
我对这些列的唯一指示是列名,其中包括子字符串“ x / y”或“ y / z”。
I can get the columns that match "y/z" using grepl
but I am not sure how I can use that array of logical values for apply
/ lapply
etc. I realize that I can extract the columns (by logical indexing or .SDcols
) and transform them, but I don't want to discard/ignore the remaining columns. 我可以使用
grepl
获得与“ y / z”匹配的列,但是我不确定如何将逻辑值数组用于apply
/ lapply
等。我意识到我可以提取列(通过逻辑索引或.SDcols
)并转换它们,但我不想放弃/忽略其余的列。
Lastly, I have tried to something like this 最后,我尝试过这样的事情
flipcols <- grepl("Y/Z", names(sites))
sites.new <- sites[, , lapply(.SD, function(x) 1/x), .SDcols = flipcols]
but there is no difference between the sites
and sites.new
, the columns that should have been transformed are not transformed and the summed difference between corresponding columns is 0. 但
sites
和sites.new
之间没有差异,应该转换的列不会被转换,并且对应列之间的总和为0。
Suggestions? 建议?
EDIT: Following @akrun's I tried the := operator, but it leads to other issues as follow: 编辑:在@akrun之后,我尝试了:=运算符,但是它导致了其他问题,如下所示:
# I think this fails because flipcols is a logical vector and not a list of names or indices
> sites.new <- sites[, (flipcols) := lapply(.SD, function(x) 1/x), .SDcols = flipcols]
Error in `[.data.table`(sites, , `:=`((flipcols), lapply(.SD, function(x) 1/x)), :
LHS of := isn't column names ('character') or positions ('integer' or 'numeric')
# and this seems to fail because .SDcols seems to lock the data in read-only mode
> sites.new <- sites[, which(flipcols) := lapply(.SD, function(x) 1/x), .SDcols = flipcols]
Error in assign(ii, SDenv$.SDall[[ii]], SDenv) :
cannot change value of locked binding for '.SD'
EDIT2: Here's a minimal example, the goal is to transform the columns which match "Y/Z" pattern (second and fourth in our minimal example here), while keeping the other columns unchanged and part of the result. EDIT2:这是一个最小的示例,目标是转换与“ Y / Z”模式匹配的列(在此最小示例中,第二和第四列), 同时保持其他列不变和部分结果。
> dt <- data.table(matrix(rnorm(25), 5,5))
> names(dt) <- c("X/Y_1", "Y/Z_1", "X/Y_2", "Y/Z_2", "X/Y_3")
> dt
X/Y_1 Y/Z_1 X/Y_2 Y/Z_2 X/Y_3
1: 1.5972490 -0.01763484 1.10745607 -0.1416583 -0.4632829
2: 0.6629621 -0.82719204 -1.68214956 0.6145526 -0.8169235
3: -0.7491393 -0.05290791 0.63935066 1.0665537 -1.9107424
4: -0.6804972 -0.40107880 -0.01030063 1.4566075 -0.6866042
5: 0.2505391 -0.29091850 -1.95926987 0.8733446 1.3909565
Following your example, 按照您的示例,
library(data.table)
dt <- data.table(matrix(rnorm(25), 5,5))
names(dt) <- c("X/Y_1", "Y/Z_1", "X/Y_2", "Y/Z_2", "X/Y_3")
dt
X/Y_1 Y/Z_1 X/Y_2 Y/Z_2 X/Y_3
1: -0.09845804 -0.6455857 0.2259012 1.26772833 1.14451170
2: -1.22147654 1.7643609 0.5310762 -0.46869816 -0.58761886
3: -0.61469060 1.2323381 -0.4028002 0.99903384 0.01650606
4: -0.80805337 0.2733621 -0.2855663 -0.02166544 0.59398122
5: -0.68398344 0.2891335 -0.5004021 2.12063769 0.40474155
I will first match the target columns 我将首先匹配目标列
sd.cols <- grep("Y/Z", names(dt), value = T)
Then, just changing the columns by reference, using standart data.table
notation. 然后,只需使用
data.table
表示法通过引用更改列data.table
。
dt[ , (sd.cols) := lapply(.SD, function(x){x^-1}), .SDcols = sd.cols ]
X/Y_1 Y/Z_1 X/Y_2 Y/Z_2 X/Y_3
1: -0.09845804 -1.5489811 0.2259012 0.7888125 1.14451170
2: -1.22147654 0.5667775 0.5310762 -2.1335693 -0.58761886
3: -0.61469060 0.8114656 -0.4028002 1.0009671 0.01650606
4: -0.80805337 3.6581513 -0.2855663 -46.1564513 0.59398122
5: -0.68398344 3.4586094 -0.5004021 0.4715563 0.40474155
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.