![](/img/trans.png)
[英]How to merge two data frames by matching two numeric columns with a +-5 range?
[英]Merge two data frames considering a range match between key columns
我是R編程的初學者。我現在正試圖從包含X和Y坐標和站點名稱的數據框中檢索一些站點名稱,並將它們復制到具有特定點的不同數據幀中。
FD <- matrix(data =c(rep(1, 500), rep(0, 500),
rnorm(1000, mean = 550000, sd=4000),
rnorm(1000, mean = 6350000, sd=20000), rep(NA, 1000)),
ncol = 4, nrow = 1000, byrow = FALSE)
colnames(FD) <- c('Survival', 'X', 'Y', 'Site')
FD <- as.data.frame(FD)
shpxt <- matrix(c(526654.7,526810.5 ,6309098,6309187,530405.4,530692,
6337699, 6338056,580432.7, 580541.9, 6380246,6380391,
585761.3, 585847.6, 6379665, 6379759, 584192.1, 584279.4,
6382358, 6382710, 583421.2, 583492.4, 6379356, 6379425,
532395.5, 532515.3 , 6336421, 6336587, 534694.6, 534791.2,
6335620, 6335740, 536749.8, 536957.5, 6337584, 6338130, 590049.6,
590419.4, 6372232, 6372432, 580443, 580756.5, 6386342, 6386473,
575263.9, 575413.7, 6380416, 6380530, 584625.1, 584753.9, 6381009,
6381335), ncol = 4, nrow = 13, byrow = TRUE)
sites <- c("Brandbaeltet", "Brusaa", "Granly", "Jerup Strand", "Knasborgvej",
"Milrimvej", "Overklitten", "Oversigtsareal", "Sandmosen",
"Strandby", "Troldkaer", "Vaagholt", "Videsletengen")
colnames(shpxt) <- c("Xmin", "Xmax", "Ymin", "Ymax")
shpxt <- as.data.frame(shpxt)
shpxt["Sites"] <- sites
我的方法是使用嵌套的for循環,如下所示:
tester <- function(FD, shpxt)
{ for (i in 1:nrow(FD)) for (j in 1:nrow(shpxt)) # Open Function
{ if (FD[i,2] >= shpxt[j,1] | FD[i,2] <= shpxt[j,2] & # Open Loop
FD[i,3] >= shpxt[j,3] | FD[i,3] <= shpxt[j,4])
{ # Open Consequent
FD[i,4]=shpxt[j,5]
{break}
} else # Close Consequent
{FD[i,4] <- NA # Open alternative
} # Close alternative
} # Close loop
} # Close function
tester(FD, shpxt)
本質上我想搜索哪個站點FD中的X和Y坐標落入范圍並將站點名稱復制到行i中的FD $站點。 當我在我的真實數據上運行循環時,我收到以下錯誤消息:
test(FD, shpxt)
Error in if (FD[i, 2] >= shpxt[j, 1] | FD[i, 2] <= shpxt[j, 2] & FD[i, :
missing value where TRUE/FALSE needed
如何讓循環從這里轉到循環將所需的網站名稱復制到我的FD?
親切的問候Thøger
考慮到關鍵列之間的范圍匹配,您希望合並兩個數據幀。 這是兩個解決方案。
sqldf
library(sqldf)
output <- sqldf("select * from FD left join shpxt
on (FD.X >= shpxt.Xmin and FD.X <= shpxt.Xmax and
FD.Y >= shpxt.Ymin and FD.Y <= shpxt.Ymax ) ")
data.table
library(data.table)
# convert your datasets in data.table
setDT(FD)
setDT(shpxt)
output <- FD[shpxt, on = .(X >= Xmin , X <= Xmax, # indicate x range
Y >= Ymin , Y <= Ymax), nomatch = NA, # indicate y range
.(Survival, X, Y, Xmin, Xmax, Ymin, Ymax, Sites )] # indicate columns in the output
有不同的替代方案來解決這個問題,因為你會在這里和這里的其他SO問題中找到它。
PS。 請記住, for loop
不一定是最佳解決方案。
這是基地R的失敗嘗試 - 也許有人可以幫助糾正
getSite <- function(x, y) {
return (shpxt[x >= shpxt['Xmin'] & x <= shpxt['Xmax'] &
y >= shpxt['Ymin'] & y <= shpxt['Ymax'] , "Sites"])
}
測試一下
p <- c(Survival=0, X=shpxt[2,1], Y=shpxt[2,3])
getSite(p[['X']],p[['Y']])
正確返回
[1] "Brusaa"
然而
FD$Site<-apply(FD, 1, function(point) {getSite(point[['X']], point[['Y']])})
失敗了
誤差在``$ < - data.frame。 (
TMP`, “網站”,值=字符(0)):置換已0行,數據具有1000
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.