[英]Calculate distance between vector of coordinates in 1 df and single coordinates in other df
假設我有以下兩個數據框(行不均勻)
set.seed(1999)
dfA <- data.frame(x = rpois(10,2), y = rpois(10,2), z = rpois(10,2), q = rpois(10,2), t = rpois(10,2))
set.seed(24)
dfB <- data.frame(a = rpois(10,2), b = rpois(10,2), c = rpois(10,2), d = rpois(10,2), e = rpois(10,2))
set.seed(10)
Dx <- sample.int(5)
set.seed(6)
Dy <- sample.int(5)
Dx <- as.data.frame(Dx)
Dx <- as.data.frame(transpose(Dx))
Dy <- as.data.frame(Dy)
Dy <- as.data.frame(transpose(Dy))
dfAB <- map2_df(dfA, dfB, str_c, sep=",") %>%
rename_all(~ str_c('C', seq_along(.)))
dfXY <- map2_df(Dx, Dy, str_c, sep=",") %>%
rename_all(~ str_c('C', seq_along(.)))
現在我有 2 個坐標數據集(dfAB 5 個變量,每個變量有 10 個觀察值,數據集 dfXY 5 個變量有 1 個觀察值)。
我想要的是一個新的數據框,其中包含 dfXY 變量 1 的觀察值與 dfAB 變量 1 中的每個單獨觀察值之間的距離、dfXY 變量 2 的觀察值 1 與 dfAB 變量 2 中的每個單獨觀察值之間的距離等.
如果我們有
dfAB dfXY
3,1 3,2 ... 3,5 1,2 2,1 5,4 4,3
2,1 3,1
2,3 1,2
... ...
那么新的數據幀應該包含以下距離:a) 3,5 & 3,1 b) 3,5 & 2,1 c) 3,5 & 2,3 等等...
以及之間的距離:a) 1,2 & 3,2 b) 1,2 & 3,1 c) 1,2 & 1,2 等等。
依此類推,所需的輸出格式為:
dfABXY
C1: C2: C3: C4: C5 etc...
4.000000 2.000000 1.000000 3.605551
4.123106 2.236068 1.414214 3.605551
2.236068 0.000000 2.236068 4.242641
3.162278 0.000000 1.000000 3.605551
3.162278 1.414214 2.000000 2.000000
etc... etc... etc... etc...
但是,當我使用:
distances <- map2_df(
dfAB,
dfXY,
~ sqrt((.x$x - .y$x)^2 + (.x$y - .y$y)^2)
)
我收到錯誤Error in .x$x : $ operator is invalid for atomic vectors
我想我需要使用類似for(i in seq_along())
函數但我不知道如何合並~ sqrt((.x$x - .y$x)^2 + (.x$y - .y$y)^2)
distance <- for(i in seq_along(dfXY)){
dfAB[,i] <- dfAB[,i] [WHAT TO PUT HERE]
請注意,我的實際數據集有數千行和列,因此我手動調用行/列是不可行的。
任何幫助深表感謝
使用提供的示例數據,我來到了這個data.table
-approach
library( data.table )
#set as data.tables
setDT(dfAB)
setDT(dfXY)
#melt to long format
dt1 <- melt( dfAB, measure.vars = names(dfAB) )[, c("x_AB","y_AB") := lapply( tstrsplit( value, ","), as.numeric ) ]
dt2 <- melt( dfXY, measure.vars = names(dfXY) )[, c("x_XY","y_XY") := lapply( tstrsplit( value, ","), as.numeric ) ]
#update join to get the coordinates to calculate distance with (join on Cx-value)
dt1[ dt2, `:=`( x_XY = i.x_XY, y_XY = i.y_XY ), on = .(variable) ]
#calculate eucledian distances
dt1[, distances := sqrt( (x_XY - x_AB )^2 + (y_XY - y_AB)^2 ) ]
輸出
# variable value x_AB y_AB x_XY y_XY distances
# 1: C1 3,1 3 1 3 5 4.000000
# 2: C1 2,1 2 1 3 5 4.123106
# 3: C1 2,3 2 3 3 5 2.236068
# 4: C1 2,2 2 2 3 5 3.162278
# 5: C1 4,2 4 2 3 5 3.162278
# 6: C1 2,4 2 4 3 5 1.414214
# 7: C1 4,1 4 1 3 5 4.123106
# 8: C1 3,3 3 3 3 5 2.000000
# 9: C1 2,3 2 3 3 5 2.236068
# 10: C1 3,1 3 1 3 5 4.000000
# 11: C2 3,2 3 2 1 2 2.000000
# 12: C2 3,1 3 1 1 2 2.236068
# 13: C2 1,2 1 2 1 2 0.000000
# 14: C2 1,2 1 2 1 2 0.000000
# 15: C2 0,1 0 1 1 2 1.414214
# 16: C2 1,4 1 4 1 2 2.000000
# 17: C2 0,1 0 1 1 2 1.414214
# 18: C2 1,0 1 0 1 2 2.000000
# 19: C2 4,2 4 2 1 2 3.000000
# 20: C2 5,1 5 1 1 2 4.123106
# 21: C3 2,0 2 0 2 1 1.000000
# 22: C3 3,2 3 2 2 1 1.414214
# 23: C3 3,3 3 3 2 1 2.236068
# 24: C3 1,1 1 1 2 1 1.000000
# 25: C3 0,1 0 1 2 1 2.000000
# 26: C3 6,3 6 3 2 1 4.472136
# 27: C3 2,0 2 0 2 1 1.000000
# 28: C3 4,2 4 2 2 1 2.236068
# 29: C3 1,2 1 2 2 1 1.414214
# 30: C3 3,0 3 0 2 1 1.414214
# 31: C4 3,1 3 1 5 4 3.605551
# 32: C4 3,1 3 1 5 4 3.605551
# 33: C4 2,1 2 1 5 4 4.242641
# 34: C4 3,1 3 1 5 4 3.605551
# 35: C4 3,4 3 4 5 4 2.000000
# 36: C4 2,1 2 1 5 4 4.242641
# 37: C4 4,3 4 3 5 4 1.414214
# 38: C4 0,2 0 2 5 4 5.385165
# 39: C4 2,3 2 3 5 4 3.162278
# 40: C4 2,5 2 5 5 4 3.162278
# 41: C5 2,2 2 2 4 3 2.236068
# 42: C5 3,1 3 1 4 3 2.236068
# 43: C5 2,1 2 1 4 3 2.828427
# 44: C5 3,1 3 1 4 3 2.236068
# 45: C5 1,0 1 0 4 3 4.242641
# 46: C5 2,0 2 0 4 3 3.605551
# 47: C5 1,1 1 1 4 3 3.605551
# 48: C5 1,1 1 1 4 3 3.605551
# 49: C5 4,1 4 1 4 3 2.000000
# 50: C5 2,1 2 1 4 3 2.828427
# variable value x_AB y_AB x_XY y_XY distances
輸出格式更改為寬
#create id's to cast on
dt1[, id := rowidv( dt1, cols = "variable" ) ]
#cast to wide
dcast( dt1, id~variable, value.var = "distances" )
# id C1 C2 C3 C4 C5
# 1: 1 4.000000 2.000000 1.000000 3.605551 2.236068
# 2: 2 4.123106 2.236068 1.414214 3.605551 2.236068
# 3: 3 2.236068 0.000000 2.236068 4.242641 2.828427
# 4: 4 3.162278 0.000000 1.000000 3.605551 2.236068
# 5: 5 3.162278 1.414214 2.000000 2.000000 4.242641
# 6: 6 1.414214 2.000000 4.472136 4.242641 3.605551
# 7: 7 4.123106 1.414214 1.000000 1.414214 3.605551
# 8: 8 2.000000 2.000000 2.236068 5.385165 3.605551
# 9: 9 2.236068 3.000000 1.414214 3.162278 2.000000
#10: 10 4.000000 4.123106 1.414214 3.162278 2.828427
一次性的(並且還刪除了第一列)
dcast( dt1, rowidv( dt1, cols = "variable" )~variable, value.var = "distances" )[, -1]
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.