简体   繁体   English

在 R 中,如何将数值转换为最接近规则间隔的值?

[英]In R, how do I convert numeric values to their closest of a regularly spaced interval?

Problem问题

I am looking for an efficient way of doing this:我正在寻找一种有效的方法来做到这一点:

Given a vector x (you may assume the values are sorted):给定一个向量x (您可以假设这些值已排序):

x <- c(0.2, 0.8, 2.3, 5.8, 9.9, 10)

and a vector y of regularly spaced values along an interval, eg step of 1 along 0 thru 10:以及沿间隔的规则间隔值的向量y ,例如沿 0 到 10 的 1 步长:

y <- 0:10

how do I obtain the vector z where the values from x have been mapped to their closest in y :如何获得向量z ,其中x的值已映射到它们在y中最接近的值:

> z

[1]  0  1  2  6 10 10

Edit: obviously, this example is simple, but I would like it to work for any regularly spaced vector y , ie, not just for this case of step 1.编辑:显然,这个例子很简单,但我希望它适用于任何规则间隔的向量y ,即,不仅仅是步骤 1 的这种情况。

Benchmarking of proposed solutions对提议的解决方案进行基准测试

library(microbenchmark)
set.seed(42)

yMin <- -6
stepSize <- 0.001
x <- rnorm(10000)
y <- seq(yMin, 6, by = stepSize)

# Onyambu's first answer.
fn1 <- function(x, y) y[max.col(-abs(outer(x, y, "-")))]

# Onyambu's second answer.
fn2 <- function(x, y) y[findInterval(x, c(-Inf, y+diff(y[1:2]) / 2, Inf))]

# Plonetheus' answer: although it works on my simple example, it does not work,
# e.g., when yMin is negative.
fn3 <- function(x, yMin, stepSize) {
  z <- rep(0, length(x))
  for (i in 1:length(x)) {
    numSteps <- (x[i] - yMin) / stepSize # approximately how many steps do we need
    if (x[i] - floor(numSteps) < ceiling(numSteps) - x[i]) { # check if we need to round up or down
      z[i] <- yMin + floor(numSteps) * stepSize # edited to add yMin
    }
    else {
      z[i] <- yMin + ceiling(numSteps) * stepSize # edited to add yMin
    }
  }
  return(z)
}

# Thiagogpsm's answer.
fn4 <- function(x, y) sapply(x, function(x_i, y) y[which.min(abs(x_i - y))], y)

microbenchmark(
  fn1(x, y),
  fn2(x, y),
  fn3(x, yMin, stepSize),
  fn4(x, y),
  times = 3L)
#> Unit: milliseconds
#>                    expr         min          lq        mean      median
#>               fn1(x, y) 5546.804339 5598.159531 6759.516597 5649.514724
#>               fn2(x, y)    1.252469    1.705517    3.695469    2.158564
#>  fn3(x, yMin, stepSize)    3.176284    3.190868   11.372397    3.205453
#>               fn4(x, y)  888.288538 1843.955232 3489.842765 2799.621925
#>           uq         max neval cld
#>  7365.872725 9082.230727     3   b
#>     4.916968    7.675373     3  a 
#>    15.470453   27.735453     3  a 
#>  4790.619879 6781.617833     3  ab

### Verdict

The second solution `fn2` in my benchmark test above, i.e., Onyambu's second answer (based on `findInterval`) is the fastest but the solution (`fn3`) proposed by Plonetheus is a close second.

One way could be:一种方法可能是:

 y[max.col(-abs(outer(x, y, "-")))]
[1]  0  1  2  6 10 10

Eg例如

x1 <- c(0.01, 2.4, 1.3, 4.1, 6.2)
y1 <- c(1, 3, 5, 7, 9)

Results:结果:

y1[max.col(-abs(outer(x1, y1, "-")))]
[1] 1 3 1 5 7

ie we see that 0.01 is close to 1 in the vector y, 2.4 is close to 3, 1.3 is close to 3, 4.1 is close to 5 and 6.2 is close to 7 as expected即我们看到向量 y 中 0.01 接近 1,2.4 接近 3,1.3 接近 3,4.1 接近 5,6.2 接近 7,正如预期的那样

If the data are sorted, then you could use the function findInterval .如果数据已排序,则可以使用 function findInterval

Since the step is the same, we do:由于步骤相同,我们这样做:

y[findInterval(x, c(-Inf, y+diff(y[1:2]) / 2, Inf))]
[1]  0  1  2  6 10 10

y1[findInterval(x1, c(-Inf, y1+diff(y1[1:2])/2, Inf))]
[1] 1 3 1 5 7

One way is to create a function that returns z_i for each x_i and apply it to the vector:一种方法是创建一个 function ,它为每个x_i返回z_i并将其应用于向量:

map_to_closest <- function(x_i, y) {
  y[which.min(abs(x_i - y))]
}

sapply(x, map_to_closest, y)
[1]  0  1  2  6 10 10

If you know the minimum of y and how large each step is, then I believe you can do something like the following to solve it in O(N) time:如果您知道 y 的最小值以及每个步骤的大小,那么我相信您可以执行以下操作来在 O(N) 时间内解决它:

getZ <- function(x, yMin, stepSize) {
    z <- rep(0, length(x))
    for (i in 1:length(x)) {
        numSteps <- (x[i] - yMin) / stepSize # approximately how many steps do we need
        if (x[i] - floor(numSteps) < ceiling(numSteps) - x[i]) { # check if we need to round up or down
            z[i] <- yMin + floor(numSteps) * stepSize # edited to add yMin
        }
        else {
            z[i] <- yMin + ceiling(numSteps) * stepSize # edited to add yMin
        }
    }
    return(z)
}

With these values, for example,例如,使用这些值,

x <- c(0.2, 0.8, 2.3, 5.8, 9.9, 10)
yMin <- 0
stepSize <- 0.3
print(getZ(x, yMin, stepSize))

we get the expected output of:我们得到预期的 output :

[1] 0.0 0.6 2.1 5.7 9.9 9.9

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何将数值数据集转换为 R 中的二进制值? - How do I convert a dataset of numeric values into binary values in R? 如何在 r 中将值更改为数字? - How do I change values to numeric in r? 如何用 R 中的间隔端点的平均值替换字符串中的数字间隔? - How do I replace numeric interval in string with a mean of interval endpoints in R? 如何将这个不规则间隔的时间序列转换(插值)为R或Matlab中的规则间隔时间序列? - How to turn (interpolate) this irregularly spaced time series into a regularly spaced one in R or Matlab? 我有一个带有文本值的栅格(ASCII格式)。 如何在R或ArcGIS中将其转换为数值? - I have a raster (ascii format) with text values. How do I convert it to numeric values in either R or ArcGIS? 检测并测试数据是否在R中规则排列 - Detecting & Testing if the data is regularly spaced in R 如何将稀疏数据框转换为数值? - How do I convert sparse data frame into numeric values? 如何在R中将字符串转换为数值 - how to convert a string into numeric values in R 如何将 R dataframe 中的列子集中的数值更改为其他数值? - How do I change numeric values in a subset of columns in a R dataframe to other numeric values? 如何处理不规则间隔的时间序列并返回规则间隔的时间序列 - How to handle irregularly spaced timeseries and returns a regularly spaced one
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM