简体   繁体   English

在 R 中查找具有最近较高值的列

[英]Finding column with nearest higher value in R

I'm trying to find the index (or the name) of the column with the closest value to another column.我试图找到与另一列最接近的列的索引(或名称)。 More precisely, I have a dataset that look like:更准确地说,我有一个如下所示的数据集:

data <- data.frame(cum_1 = c(1,2), 
           cum_2 = c(2,3),
           cum_3 = c(3,4),
           median = c(1, 2.2))

And I'm trying to come up with a function telling me which of the cum_i column gives me the nearest number above the associated median value.我试图想出一个 function 告诉我哪个 cum_i 列给了我相关中值以上最接近的数字。 With the dataset provided above for instance, the function would tell me that cum_1 provides it for the first row, and cum_2 does it in the second row.以上面提供的数据集为例,function 会告诉我cum_1为第一行提供它,而cum_2在第二行提供它。 (Or in index notation). (或以索引表示法)。

Any help appreciated, thanks!任何帮助表示赞赏,谢谢!

I'm sure there are more elegant ways to do this but here's a base R start:我确信有更优雅的方法可以做到这一点,但这里有一个基本的 R 开始:

apply(data, 1, function(x) {
    val <- x[-length(x)] - x[length(x)]; which.min(replace(val, val < 0, NA))})
#[1] 1 2

Explanations:说明:

  1. apply(data, 1, function(x)...) applies the function to every row (since MARGIN = 1 ). apply(data, 1, function(x)...)将 function 应用于每一行(因为MARGIN = 1 )。
  2. The function calculates val for every row as the values from all columns except the last x[-length(x)] minus the value in the last column x[length(x)] (the median column). function 将每一行的val计算为除最后x[-length(x)]之外的所有列的值减去最后一列x[length(x)] (中median列)中的值。 The function then returns the index of the row vector (ie the index of the column) with the closest but higher value to the median value. function 然后返回与中值最接近但较高的行向量的索引(即列的索引)。 We get the "closest but higher" value by replacing all closest but smaller values with NA .我们通过用NA替换所有最接近但更小的值来获得“最接近但更高”的值。

You can use max.col for a vectorised option:您可以将max.col用于矢量化选项:

max.col(data[-ncol(data)] - data$median >= 0, ties.method = 'first')
#[1] 1 2

data[-ncol(data)] removes the last column ( median ) and subtracts each of these columns with data$median to get: data[-ncol(data)]删除最后一列( median )并用data$median减去这些列中的每一列以获得:

data[-ncol(data)] - data$median
#  cum_1 cum_2 cum_3
#1   0.0   1.0   2.0
#2  -0.2   0.8   1.8

We compare this output with >= 0 to get TRUE / FALSE values我们将此 output 与>= 0进行比较以获得TRUE / FALSE

data[-ncol(data)] - data$median >= 0

#     cum_1 cum_2 cum_3
#[1,]  TRUE  TRUE  TRUE
#[2,] FALSE  TRUE  TRUE

Since TRUE > FALSE we can use max.col to get column index of maximum value in each row.由于TRUE > FALSE ,我们可以使用max.col来获取每行中最大值的列索引。 In case, if there are more than 1 value which are TRUE we specify ties.method = 'first to get the first index.如果有超过 1 个为TRUE的值,我们指定ties.method = 'first以获取第一个索引。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM