简体   繁体   English

R - 根据第二个数据帧中最接近的匹配来分配列值

[英]R - Assign column value based on closest match in second data frame

I have two data frames, logger and df (times are numeric): 我有两个数据框,logger和df(次数是数字):

logger <- data.frame(
time = c(1280248354:1280248413),
temp = runif(60,min=18,max=24.5)
)

df <- data.frame(
obs = c(1:10),
time = runif(10,min=1280248354,max=1280248413),
temp = NA
)

I would like to search logger$time for the closest match to each row in df$time, and assign the associated logger$temp to df$temp. 我想在logf $ time中搜索与df $ time中每行最接近的匹配,并将相关的logger $ temp分配给df $ temp。 So far, I have been successful using the following loop: 到目前为止,我已成功使用以下循环:

for (i in 1:length(df$time)){
closestto<-which.min(abs((logger$time) - (df$time[i])))
df$temp[i]<-logger$temp[closestto]
}

However, I now have large data frames (logger has 13,620 rows and df has 266138) and processing times are long. 但是,我现在有大数据帧(记录器有13,620行,df有266138),处理时间很长。 I've read that loops are not the most efficient way to do things, but I am unfamiliar with alternatives. 我已经读过循环不是最有效的方法,但我不熟悉替代方案。 Is there a faster way to do this? 有更快的方法吗?

I'd use data.table for this. 我会使用data.table It makes it super easy and super fast joining on keys . 它使得它非常容易且超快速地加入keys There is even a really helpful roll = "nearest" argument for exactly the behaviour you are looking for (except in your example data it is not necessary because all times from df appear in logger ). 对于您正在寻找的行为,甚至还有一个非常有用的roll = "nearest"参数(除非您的示例数据中没有必要,因为df所有times都出现在logger )。 In the following example I renamed df$time to df$time1 to make it clear which column belongs to which table... 在下面的示例中,我将df$time重命名为df$time1 ,以明确哪个列属于哪个表...

#  Load package
require( data.table )

#  Make data.frames into data.tables with a key column
ldt <- data.table( logger , key = "time" )
dt <- data.table( df , key = "time1" )

#  Join based on the key column of the two tables (time & time1)
#  roll = "nearest" gives the desired behaviour
#  list( obs , time1 , temp ) gives the columns you want to return from dt
ldt[ dt , list( obs , time1 , temp ) , roll = "nearest" ]
#          time obs      time1     temp
# 1: 1280248361   8 1280248361 18.07644
# 2: 1280248366   4 1280248366 21.88957
# 3: 1280248370   3 1280248370 19.09015
# 4: 1280248376   5 1280248376 22.39770
# 5: 1280248381   6 1280248381 24.12758
# 6: 1280248383  10 1280248383 22.70919
# 7: 1280248385   1 1280248385 18.78183
# 8: 1280248389   2 1280248389 18.17874
# 9: 1280248393   9 1280248393 18.03098
#10: 1280248403   7 1280248403 22.74372

You could use the data.table library. 您可以使用data.table库。 This will also help with being more efficient with large data size - 这也有助于提高数据大小的效率 -

library(data.table)

logger <- data.frame(
  time = c(1280248354:1280248413),
  temp = runif(60,min=18,max=24.5)
)

df <- data.frame(
  obs = c(1:10),
  time = runif(10,min=1280248354,max=1280248413)
)

logger <- data.table(logger)
df <- data.table(df)

setkey(df,time)
setkey(logger,time)

df2 <- logger[df, roll = "nearest"]

Output - 输出 -

> df2
          time     temp obs
 1: 1280248356 22.81437   7
 2: 1280248360 24.08711  10
 3: 1280248366 22.31738   2
 4: 1280248367 18.61222   5
 5: 1280248388 19.46300   4
 6: 1280248393 18.26535   6
 7: 1280248400 20.61901   9
 8: 1280248402 21.92584   1
 9: 1280248410 19.36526   8
10: 1280248410 19.36526   3

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 在R中,根据与第二个数据框中的值的近似数值匹配,创建/填充数据框中的一列 - In R, create/fill a column of a data frame based on an approximate numerical match to values in a second data frame 根据与其他数据帧最近的邻居分配值 - Assign a value based on closest neighbour from other data frame 如何根据R数据框或数据表中其他列中的值为列分配值 - How to assign a value to a column based on value in other column in R data frame or data table 在数据帧的每一列中查找最接近零的值-R - Find value closest to zero in each column of a data frame - R 在R中最匹配的数据框中查找行 - find row in data frame with closest match in R 如何在R中的数据框中使用mutate根据第二列的值更新列 - How do I use mutate in a data frame in R to update column based on value of a second column R,根据第二列与向量的匹配值从数据帧列中选择值 - R, select values from data frame column based on matching value in a second column to a vector 根据第二个数据框列中的匹配替换“数据框列”中的值 - Replace Values in Dataframe Column based on match in second data frame columns 根据 R 中的列值聚合数据框 - Aggregate data frame based on column value in R 在第二个数据帧中基于R列标题标签的水平值 - R Column Header Labeling based Horizontal values in second data frame
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM