[英]How to combine two columns with an array
I have DF_1
that shows the cities of origin and destination and I want to know how far (miles / km) they are.我有
DF_1
显示出发地和目的地城市,我想知道它们有多远(英里/公里)。 In DF_2
I have the distances between cities.在
DF_2
我有城市之间的距离。 How do I know the distances with these two DF?我怎么知道这两个 DF 的距离?
DF_1
: DF_1
:
origin <- c('LONDON','NEW YORK','TOKIO','LONDON','RIO DE JANEIRO')
destination <- c('NEW YORK','NEW YORK','RIO DE JANEIRO','LISBON','MADRID')
DF_1 <- data.frame(origin,destination)
DF_2
: DF_2
:
CITY <- c('NEW YORK', 'LONDON', 'SAN FRANCISCO', 'MADRID', 'LOS ANGELES', 'LISBON', 'RIO DE JANEIRO', 'MOSCOW', 'SAO PAULO', 'TOKIO')
NEW_YORK <- c(0, 700, 250, 1000, 400, 800, 430, 900, 500, 30)
LONDON <- c(700, 0, 350, 1200, 50, 110, 780, 984, 1150, 5)
SAN_FRANCISCO <- c(250, 350, 0, 200, 15, 260, 305, 412, 29, 102)
MADRID <- c(1000, 1200, 200, 0, 77, 115, 225, 318, 412, 511)
LOS_ANGELES <- c(400, 50, 15, 77, 0, 88, 819, 733, 978, 1001)
LISBON <- c(800, 110, 260, 115, 88, 0, 17, 3000, 1418, 735)
RIO_DE_JANEIRO <- c(430, 780, 305, 225, 819, 17, 0, 513, 701, 56)
MOSCOW <- c(900, 984, 412, 318, 733, 3000, 513, 0, 389, 499)
SAO_PAULO <- c(500, 1150, 29, 412, 978, 1418, 701, 389, 0, 1113)
TOKIO <- c(30, 5, 102, 511, 1001, 735, 56, 499, 1113, 0)
DF_2 <- data.frame(CITY, `NEW YORK` = NEW_YORK, LONDON, `SAN FRANCISCO` = SAN_FRANCISCO, MADRID, `LOS ANGELES` = LOS_ANGELES, LISBON, `RIO DE JANEIRO` = RIO_DE_JANEIRO, MOSCOW, `SAO PAULO` = SAO_PAULO, TOKIO, check.names = FALSE)
The result I want is this:我想要的结果是这样的:
origin <- c('LONDON','NEW YORK','TOKIO','LONDON','RIO DE JANEIRO')
destination <- c('NEW YORK','NEW YORK','RIO DE JANEIRO','LISBON','MADRID')
distance <- c(700,0,56,110,225)
DF_FINAL <- data.frame(origin,destination,distance)
using base R: you could use:使用基础 R:您可以使用:
transform(DF_1,distance = `rownames<-`(DF_2[,-1],DF_2[,1])[as.matrix(DF_1)])
origin destination distance
1 LONDON NEW YORK 700
2 NEW YORK NEW YORK 0
3 TOKIO RIO DE JANEIRO 56
4 LONDON LISBON 110
5 RIO DE JANEIRO MADRID 225
That is.那是。 create a new dataframe with the rownames as the city names:
创建一个新的 dataframe ,其中 rownames 作为城市名称:
DF_3 <- DF_2[,-1]#Remove the first column
rownames(DF_3) <- DF_2$CITY #change the rownames:
DF_1$DISTANCE <- DF_3[as.matrix(DF_1)]
DF_1
Here is an option with row/column
indexing from base R
这是一个从
base R
进行row/column
索引的选项
i1 <- match(DF_1$origin, DF_2$CITY)
j1 <- match(DF_1$destination, names(DF_2)[-1])
DF_1$distance <- DF_2[-1][cbind(i1, j1)]
DF_1
# origin destination distance
#1 LONDON NEW YORK 700
#2 NEW YORK NEW YORK 0
#3 TOKIO RIO DE JANEIRO 56
#4 LONDON LISBON 110
#5 RIO DE JANEIRO MADRID 225
This should reproduce exactly what you're looking for (using the tidyverse
):这应该准确地重现您正在寻找的内容(使用
tidyverse
):
DF_FINAL <- DF_1 %>%
inner_join(DF_2, by = c("origin" = "CITY")) %>%
gather(key = "city", value = "distance", -origin, -destination) %>%
filter(destination == city) %>%
select(-c(city))
DF_FINAL
|origin |destination | distance|
|:--------------|:--------------|--------:|
|LONDON |NEW YORK | 700|
|NEW YORK |NEW YORK | 0|
|RIO DE JANEIRO |MADRID | 225|
|LONDON |LISBON | 110|
|TOKIO |RIO DE JANEIRO | 56|
I try doing this stuff in the tidyverse
framework.我尝试在
tidyverse
框架中做这些事情。 First step is to turn the matrix of distances into the "long" format.第一步是将距离矩阵转换为“长”格式。 Then, just join that to the original
data.frame
!然后,只需将其加入原始
data.frame
!
I suggest adding stringsAsFactors = FALSE
to the end of your data.frame()
definitions to avoid warning messages.我建议在
data.frame()
定义的末尾添加stringsAsFactors = FALSE
以避免警告消息。
library(tidyr)
library(dplyr)
pivot_longer(DF_2, -CITY) %>%
rename(origin = CITY, destination = name, distance = value) %>%
right_join(DF_1)
# A tibble: 5 x 3
origin destination distance
<chr> <chr> <dbl>
1 LONDON NEW YORK 700
2 NEW YORK NEW YORK 0
3 TOKIO RIO DE JANEIRO 56
4 LONDON LISBON 110
5 RIO DE JANEIRO MADRID 225
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.