简体   繁体   English

计算数据帧2个元素之间的距离

[英]Calculating distance between 2 elements of a data frame

I have a data frame that looks like this: 我有一个看起来像这样的数据框:

library(dplyr)
size_df <- tibble(size_chr = c("XS", "S", "M", "L", "XL", "1XL", "2XL", "3XL", "4XL", "5XL", "6XL"),
                  size_min = c(0,36,39,42,45,48,52,56,60,64,66),
                  size_max = c(36,39,42,45,48,52,56,60,64,66,70))

For any given number less than 70, I want to find the two sizes that it lies between, and the distance between them both (normalised to between 0 and 1) 对于任何给定的小于70的数字,我想找到它之间的两个大小以及它们之间的距离(归一化为0到1之间)

For example: 例如:

input <- 37.2

# S  0.6
# M   0.4

input <- 48

# XL  1

input <- 68

# 5XL  0.5
# 6XL   0.5
INDS = c(max(1, tail(which(size_df$size_min < input), 1)),
  min(NROW(size_df), 1 + head(which(size_df$size_max > input), 1)))
size_df$size_chr[INDS]
#[1] "S" "M"

DIST = c(abs(size_df$size_min[INDS[1]] - input),
         abs(size_df$size_max[INDS[2]] - input))
DIST/sum(DIST)
#[1] 0.2 0.8

This is the perfect case for findInterval() . 这是findInterval()的完美案例。 We'll create a vector of the breaks between categories and use those to calculate scaling factors. 我们将创建类别间中断的向量,并使用这些中断来计算比例因子。

size_breaks <- c(size_df[["size_min"]], max(size_df[["size_max"]]))
size_breaks
# [1]  0 36 39 42 45 48 52 56 60 64 66 70
size_spans  <- diff(size_breaks)
size_scales <- 1 / size_spans
size_scales
# [1] 0.02777778 0.33333333 0.33333333 0.33333333 0.33333333 0.25000000 0.25000000
# [8] 0.25000000 0.25000000 0.50000000 0.25000000

findInterval() will give us the index of the lower bound. findInterval()将为我们提供下界的索引。 The upper bound is just that index + 1. 上限就是该索引+ 1。

neighbor_distances <- function(x) {
  lower <- findInterval(x, size_breaks)
  neighbors <- c(lower, lower + 1)
  distances <- abs(x - size_breaks[neighbors]) * size_scales[lower]
  tibble(
    size_chr = size_df[["size_chr"]][neighbors],
    distance = distances
  )
}

It works well for your first example. 它适合您的第一个示例。

neighbor_distances(37.2)
# # A tibble: 2 x 2
#   size_chr distance
#   <chr>       <dbl>
# 1 S           0.4  
# 2 M           0.600

The second example gives two rows instead of just one, but that can be handled with extra logic in the function. 第二个示例提供了两行而不是仅一行,但是可以使用函数中的额外逻辑来处理。 I left that logic out to keep things simple. 为了使事情简单,我省略了这种逻辑。

neighbor_distances(48)
# # A tibble: 2 x 2
#   size_chr distance
#   <chr>       <dbl>
# 1 1XL             0
# 2 2XL             1

It gives a different answer for your third example, but I don't know why you expect a number to be compared to a size category smaller than the lower bound. 对于您的第三个示例,它给出了不同的答案,但是我不知道为什么您希望将数字与小于下限的大小类别进行比较。

neighbor_distances(68)
# # A tibble: 2 x 2
#   size_chr distance
#   <chr>       <dbl>
# 1 6XL           0.5
# 2 NA            0.5

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM