简体   繁体   English

根据与其他数据帧最近的邻居分配值

[英]Assign a value based on closest neighbour from other data frame

With generic data: 使用通用数据:

set.seed(456)

a <- sample(0:1,50,replace = T)
b <- rnorm(50,15,5)
df1 <- data.frame(a,b)

c <- seq(0.01,0.99,0.01)
d <- rep(NA, 99)
for (i in 1:99) {
  d[i] <- 0.5*(10*c[i])^2+5
}
df2 <- data.frame(c,d)

For each df1$b we want to find the nearest df2$d . 对于每个df1$b我们想要找到最近的df2$d Then we create a new variable df1$XYZ that takes the df2$c value of the nearest df2$d 然后我们创建一个新的变量df1$XYZ ,它取df2$c的最接近的df2$d

This question has guided me towards data.table library. 这个问题引导我走向data.table库。 But I am not sure if ddplyr and group_by can also be used: 但我不确定是否也可以使用ddplyrgroup_by

Here was my data.table attempt: 这是我的data.table尝试:

library(data.table)
dt1 <- data.table( df1 , key = "b" )
dt2 <- data.table( df2 , key = "d" )

dt[ ldt , list( d ) , roll = "nearest" ]

Here's one way with data.table : 这是data.table的一种方式:

require(data.table)
setDT(df1)[, XYZ := setDT(df2)[df1, c, on=c(d="b"), roll="nearest"]]

You need to get df2$c corresponding to the nearest value in df2$d for every df1$b . 对于每个df1$b你需要得到与df2$d最接近的值相对应的df2$c So, we need to join as df2[df1] which results in nrow(df1) rows.That can be done with setDT(df2)[df1, c, on=c(d="b"), roll="nearest"] . 因此,我们需要以df2[df1] ,这会产生nrow(df1)行。可以使用setDT(df2)[df1, c, on=c(d="b"), roll="nearest"]

It returns the result you require. 它返回您需要的结果。 All we need to do is to add this back to df1 with the name XYZ . 我们需要做的就是将其添加回名为XYZ df1 We do that using := . 我们这样做:=


The thought process in constructing the rolling join is something like this (assuming df1 and df2 are both data tables): 构建滚动连接的思维过程是这样的(假设df1df2都是数据表):

  1. We need get some value(s) for each row of df1 . 我们需要为df1每一行获取一些值。 That means, i = df1 in x[i] syntax. 这意味着,在x[i]语法中i = df1

     df2[df1] 
  2. We need to join df2$d with df1$b . 我们需要将df2$d加入df1$b Using on= that'd be: 使用on=那是:

     df2[df1, on=c(d="b")] 
  3. We need just the c column. 我们只需要c列。 Use j to select just that column. 使用j仅选择该列。

     df2[df1, c, on=c(d="b")] 
  4. We don't need equi-join but roll to nearest join. 我们不需要等连接,而是滚动到最近的连接。

     df2[df1, c, on=c(d="b"), roll="nearest"] 

Hope this helps. 希望这可以帮助。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM