简体   繁体   English

将一列值四舍五入到另一列中最接近的值(R 代码)

[英]Rounding a column of values to the nearest values in another column (R Code)

I have a data frame of coordinates like the one bellow.我有一个坐标数据框,如下所示。

longitude latitude
1    95.93604 41.25908
2    95.93371 41.25941
3    95.93137 41.25974
4    95.92904 41.26008
5    95.92670 41.26041

and I want to round both the longitudes and latitudes to the nearest values in a existing data frame with coordinates like the one below.我想将经度和纬度四舍五入到现有数据框中的最接近的值,坐标如下所示。

latitude longitude
41.45131  96.42024
40.81344  96.66093
41.11293 102.85215
40.37834  96.61095
42.84468  97.40045
41.18000  96.11592
40.69164  99.53231
40.37834  96.61095
41.34500  95.95712

How can I do this in R?如何在 R 中执行此操作? I tried using the interp1 function from the package pracma, but I was not able to get the correct result.我尝试使用 package pracma 中的 interp1 function,但我无法获得正确的结果。 I suppose I could write a function of my own, but I am also curious if there is a simpler and more elegant way of going about this.我想我可以自己写一个 function,但我也很好奇是否有更简单、更优雅的方法来解决这个问题。 If you have any other suggestions for how I can find the closest corresponding coordinates from one data frame to the other, that would be appreciated too!如果您对我如何找到从一个数据帧到另一个数据帧最近的对应坐标有任何其他建议,那也将不胜感激! Thank you!谢谢!

We can use a k-nearest neighbour classification algorithm.我们可以使用k-最近邻分类算法。

The class package has the knn1 function that can do this. class package 具有可以执行此操作的knn1 function。 It returns the indices of the nearest neighbours based on two data frames, train (the data containing your "rounded" coordinates) and test (your actual data).它根据两个数据框返回最近邻居的索引, train (包含“舍入”坐标的数据)和test (您的实际数据)。

(ind <- knn1(train, test, cl=1:nrow(train)))
[1] 1 9 9 3 9
Levels: 1 2 3 4 5 6 7 8 9

This shows that the first row of test is nearest to the first row of train , the fourth row nearest the third row, and all other rows nearest to the last (9th) row.这表明test的第一行最接近train的第一行,第四行最接近第三行,所有其他行最接近最后(第 9 行)。

We can then use these indices to extract the rounded coordinates into two new columns (or replace the existing ones).然后,我们可以使用这些索引将舍入坐标提取到两个新列中(或替换现有列)。

test$longitude.rnd <- train$longitude[ind]
test$latitude.rnd <- train$latitude[ind]
test
  longitude latitude longitude.rnd latitude.rnd
1  96.42604 41.45908      96.42024     41.45131
2  95.93371 41.25941      95.95712     41.34500
3  95.93137 41.25974      95.95712     41.34500
4 102.82904 41.16008     102.85215     41.11293
5  95.92670 41.26041      95.95712     41.34500

Test Data : (I modified two rows to show variation, otherwise all rows would return the 9th)测试数据:(我修改了两行以显示变化,否则所有行都将返回第 9 行)

test <- structure(list(longitude = c(96.42604, 95.93371, 95.93137, 102.82904, 
95.9267), latitude = c(41.45908, 41.25941, 41.25974, 41.16008, 
41.26041)), row.names = c("1", "2", "3", "4", "5"), class = "data.frame")

  longitude latitude
1  96.42604 41.45908
2  95.93371 41.25941
3  95.93137 41.25974
4 102.82904 41.16008
5  95.92670 41.26041

Train Data : (no modification, except that I swapped the columns to match the test data)训练数据:(没有修改,除了我交换了列以匹配测试数据)

train <- structure(list(longitude = c(96.42024, 96.66093, 102.85215, 96.61095, 
97.40045, 96.11592, 99.53231, 96.61095, 95.95712), latitude = c(41.45131, 
40.81344, 41.11293, 40.37834, 42.84468, 41.18, 40.69164, 40.37834, 
41.345)), class = "data.frame", row.names = c(NA, -9L))

  longitude latitude
1  96.42024 41.45131
2  96.66093 40.81344
3 102.85215 41.11293
4  96.61095 40.37834
5  97.40045 42.84468
6  96.11592 41.18000
7  99.53231 40.69164
8  96.61095 40.37834
9  95.95712 41.34500

A simpler approach (A function Indeed).一种更简单的方法(确实是 function)。

#' title, Find nearest value
#'
#' @param x, element for which nearest value has to be found.
#' @param ref_col, field name from reference table from which x has to be compared.
#'
#' @return, nearest value to x.
find_nearest_fun <- function(x, ref_col = "latitude") {
  #browser()
  ref_field_vec <- ref_d[[ref_col]] # reference field 
  min_idx <- which.min(abs(ref_field_vec - x)) 
  return(ref_field_vec[min_idx])
}

d %>% mutate(nearest_lat = map_dbl(latitude, ~find_nearest_fun(.x)),
             nearest_long = map_dbl(longitude, ~find_nearest_fun(.x,ref_col = "longitude"))) %>% 
  view()

Data ( a bit modified for functionality check )数据为功能检查稍作修改

tribble(~latitude, ~longitude,
41.45131,  96.42024,
40.81344,  96.66093,
41.11293,  102.85215,
40.37834,  96.61095,
42.84468,  97.40045,
41.18000,  96.11592,
40.69164,  99.53231,
40.37834,  96.61095,
41.34500,  95.95712
) -> ref_d


tribble(~longitude ,~latitude,
95.93604  ,41.25908,
95.93371  ,41.25941,
95.93137  ,41.25974,
95.92904  ,41.26008,
95.92670  ,41.26041,
98.92670  ,40.26041,
96.92670  ,40.60412
) -> d

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM