[英]In R, how do I convert numeric values to their closest of a regularly spaced interval?
我正在尋找一種有效的方法來做到這一點:
給定一個向量x
(您可以假設這些值已排序):
x <- c(0.2, 0.8, 2.3, 5.8, 9.9, 10)
以及沿間隔的規則間隔值的向量y
,例如沿 0 到 10 的 1 步長:
y <- 0:10
如何獲得向量z
,其中x
的值已映射到它們在y
中最接近的值:
> z
[1] 0 1 2 6 10 10
編輯:顯然,這個例子很簡單,但我希望它適用於任何規則間隔的向量y
,即,不僅僅是步驟 1 的這種情況。
library(microbenchmark)
set.seed(42)
yMin <- -6
stepSize <- 0.001
x <- rnorm(10000)
y <- seq(yMin, 6, by = stepSize)
# Onyambu's first answer.
fn1 <- function(x, y) y[max.col(-abs(outer(x, y, "-")))]
# Onyambu's second answer.
fn2 <- function(x, y) y[findInterval(x, c(-Inf, y+diff(y[1:2]) / 2, Inf))]
# Plonetheus' answer: although it works on my simple example, it does not work,
# e.g., when yMin is negative.
fn3 <- function(x, yMin, stepSize) {
z <- rep(0, length(x))
for (i in 1:length(x)) {
numSteps <- (x[i] - yMin) / stepSize # approximately how many steps do we need
if (x[i] - floor(numSteps) < ceiling(numSteps) - x[i]) { # check if we need to round up or down
z[i] <- yMin + floor(numSteps) * stepSize # edited to add yMin
}
else {
z[i] <- yMin + ceiling(numSteps) * stepSize # edited to add yMin
}
}
return(z)
}
# Thiagogpsm's answer.
fn4 <- function(x, y) sapply(x, function(x_i, y) y[which.min(abs(x_i - y))], y)
microbenchmark(
fn1(x, y),
fn2(x, y),
fn3(x, yMin, stepSize),
fn4(x, y),
times = 3L)
#> Unit: milliseconds
#> expr min lq mean median
#> fn1(x, y) 5546.804339 5598.159531 6759.516597 5649.514724
#> fn2(x, y) 1.252469 1.705517 3.695469 2.158564
#> fn3(x, yMin, stepSize) 3.176284 3.190868 11.372397 3.205453
#> fn4(x, y) 888.288538 1843.955232 3489.842765 2799.621925
#> uq max neval cld
#> 7365.872725 9082.230727 3 b
#> 4.916968 7.675373 3 a
#> 15.470453 27.735453 3 a
#> 4790.619879 6781.617833 3 ab
### Verdict
The second solution `fn2` in my benchmark test above, i.e., Onyambu's second answer (based on `findInterval`) is the fastest but the solution (`fn3`) proposed by Plonetheus is a close second.
一種方法可能是:
y[max.col(-abs(outer(x, y, "-")))]
[1] 0 1 2 6 10 10
例如
x1 <- c(0.01, 2.4, 1.3, 4.1, 6.2)
y1 <- c(1, 3, 5, 7, 9)
結果:
y1[max.col(-abs(outer(x1, y1, "-")))]
[1] 1 3 1 5 7
即我們看到向量 y 中 0.01 接近 1,2.4 接近 3,1.3 接近 3,4.1 接近 5,6.2 接近 7,正如預期的那樣
如果數據已排序,則可以使用 function findInterval
。
由於步驟相同,我們這樣做:
y[findInterval(x, c(-Inf, y+diff(y[1:2]) / 2, Inf))]
[1] 0 1 2 6 10 10
y1[findInterval(x1, c(-Inf, y1+diff(y1[1:2])/2, Inf))]
[1] 1 3 1 5 7
一種方法是創建一個 function ,它為每個x_i
返回z_i
並將其應用於向量:
map_to_closest <- function(x_i, y) {
y[which.min(abs(x_i - y))]
}
sapply(x, map_to_closest, y)
[1] 0 1 2 6 10 10
如果您知道 y 的最小值以及每個步驟的大小,那么我相信您可以執行以下操作來在 O(N) 時間內解決它:
getZ <- function(x, yMin, stepSize) {
z <- rep(0, length(x))
for (i in 1:length(x)) {
numSteps <- (x[i] - yMin) / stepSize # approximately how many steps do we need
if (x[i] - floor(numSteps) < ceiling(numSteps) - x[i]) { # check if we need to round up or down
z[i] <- yMin + floor(numSteps) * stepSize # edited to add yMin
}
else {
z[i] <- yMin + ceiling(numSteps) * stepSize # edited to add yMin
}
}
return(z)
}
例如,使用這些值,
x <- c(0.2, 0.8, 2.3, 5.8, 9.9, 10)
yMin <- 0
stepSize <- 0.3
print(getZ(x, yMin, stepSize))
我們得到預期的 output :
[1] 0.0 0.6 2.1 5.7 9.9 9.9
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.