[英]Assign a value based on closest neighbour from other data frame
With generic data: 使用通用数据:
set.seed(456)
a <- sample(0:1,50,replace = T)
b <- rnorm(50,15,5)
df1 <- data.frame(a,b)
c <- seq(0.01,0.99,0.01)
d <- rep(NA, 99)
for (i in 1:99) {
d[i] <- 0.5*(10*c[i])^2+5
}
df2 <- data.frame(c,d)
For each df1$b
we want to find the nearest df2$d
. 对于每个
df1$b
我们想要找到最近的df2$d
。 Then we create a new variable df1$XYZ
that takes the df2$c
value of the nearest df2$d
然后我们创建一个新的变量
df1$XYZ
,它取df2$c
的最接近的df2$d
This question has guided me towards data.table
library. 这个问题引导我走向
data.table
库。 But I am not sure if ddplyr
and group_by
can also be used: 但我不确定是否也可以使用
ddplyr
和group_by
:
Here was my data.table
attempt: 这是我的
data.table
尝试:
library(data.table)
dt1 <- data.table( df1 , key = "b" )
dt2 <- data.table( df2 , key = "d" )
dt[ ldt , list( d ) , roll = "nearest" ]
Here's one way with data.table
: 这是
data.table
的一种方式:
require(data.table)
setDT(df1)[, XYZ := setDT(df2)[df1, c, on=c(d="b"), roll="nearest"]]
You need to get df2$c
corresponding to the nearest value in df2$d
for every df1$b
. 对于每个
df1$b
你需要得到与df2$d
最接近的值相对应的df2$c
。 So, we need to join as df2[df1]
which results in nrow(df1)
rows.That can be done with setDT(df2)[df1, c, on=c(d="b"), roll="nearest"]
. 因此,我们需要以
df2[df1]
,这会产生nrow(df1)
行。可以使用setDT(df2)[df1, c, on=c(d="b"), roll="nearest"]
。
It returns the result you require. 它返回您需要的结果。 All we need to do is to add this back to
df1
with the name XYZ
. 我们需要做的就是将其添加回名为
XYZ
df1
。 We do that using :=
. 我们这样做
:=
。
The thought process in constructing the rolling join is something like this (assuming df1
and df2
are both data tables): 构建滚动连接的思维过程是这样的(假设
df1
和df2
都是数据表):
We need get some value(s) for each row of df1
. 我们需要为
df1
每一行获取一些值。 That means, i = df1
in x[i]
syntax. 这意味着,在
x[i]
语法中i = df1
。
df2[df1]
We need to join df2$d
with df1$b
. 我们需要将
df2$d
加入df1$b
。 Using on=
that'd be: 使用
on=
那是:
df2[df1, on=c(d="b")]
We need just the c
column. 我们只需要
c
列。 Use j
to select just that column. 使用
j
仅选择该列。
df2[df1, c, on=c(d="b")]
We don't need equi-join but roll to nearest join. 我们不需要等连接,而是滚动到最近的连接。
df2[df1, c, on=c(d="b"), roll="nearest"]
Hope this helps. 希望这可以帮助。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.