[英]Finding column with nearest higher value in R
I'm trying to find the index (or the name) of the column with the closest value to another column.我试图找到与另一列最接近的列的索引(或名称)。 More precisely, I have a dataset that look like:
更准确地说,我有一个如下所示的数据集:
data <- data.frame(cum_1 = c(1,2),
cum_2 = c(2,3),
cum_3 = c(3,4),
median = c(1, 2.2))
And I'm trying to come up with a function telling me which of the cum_i column gives me the nearest number above the associated median value.我试图想出一个 function 告诉我哪个 cum_i 列给了我相关中值以上最接近的数字。 With the dataset provided above for instance, the function would tell me that
cum_1
provides it for the first row, and cum_2
does it in the second row.以上面提供的数据集为例,function 会告诉我
cum_1
为第一行提供它,而cum_2
在第二行提供它。 (Or in index notation). (或以索引表示法)。
Any help appreciated, thanks!任何帮助表示赞赏,谢谢!
I'm sure there are more elegant ways to do this but here's a base R start:我确信有更优雅的方法可以做到这一点,但这里有一个基本的 R 开始:
apply(data, 1, function(x) {
val <- x[-length(x)] - x[length(x)]; which.min(replace(val, val < 0, NA))})
#[1] 1 2
Explanations:说明:
apply(data, 1, function(x)...)
applies the function to every row (since MARGIN = 1
). apply(data, 1, function(x)...)
将 function 应用于每一行(因为MARGIN = 1
)。val
for every row as the values from all columns except the last x[-length(x)]
minus the value in the last column x[length(x)]
(the median
column). val
计算为除最后x[-length(x)]
之外的所有列的值减去最后一列x[length(x)]
(中median
列)中的值。 The function then returns the index of the row vector (ie the index of the column) with the closest but higher value to the median value. NA
.NA
替换所有最接近但更小的值来获得“最接近但更高”的值。You can use max.col
for a vectorised option:您可以将
max.col
用于矢量化选项:
max.col(data[-ncol(data)] - data$median >= 0, ties.method = 'first')
#[1] 1 2
data[-ncol(data)]
removes the last column ( median
) and subtracts each of these columns with data$median
to get: data[-ncol(data)]
删除最后一列( median
)并用data$median
减去这些列中的每一列以获得:
data[-ncol(data)] - data$median
# cum_1 cum_2 cum_3
#1 0.0 1.0 2.0
#2 -0.2 0.8 1.8
We compare this output with >= 0
to get TRUE
/ FALSE
values我们将此 output 与
>= 0
进行比较以获得TRUE
/ FALSE
值
data[-ncol(data)] - data$median >= 0
# cum_1 cum_2 cum_3
#[1,] TRUE TRUE TRUE
#[2,] FALSE TRUE TRUE
Since TRUE > FALSE
we can use max.col
to get column index of maximum value in each row.由于
TRUE > FALSE
,我们可以使用max.col
来获取每行中最大值的列索引。 In case, if there are more than 1 value which are TRUE
we specify ties.method = 'first
to get the first index.如果有超过 1 个为
TRUE
的值,我们指定ties.method = 'first
以获取第一个索引。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.