簡體   English   中英

從 R 中的另一個數據幀中查找所有字符串匹配項

[英]Finding all string matches from another dataframe in R

我在 R 中相對較新。

我有一個數據框locs ,它有 1 個變量V1 ,看起來像:

V1
edmonton general hospital
cardiovascular institute, hospital san carlos, madrid spain
hospital of santa maria, lisbon, portugal

另一個數據框cities有兩個如下所示的變量:

city              country
edmonton          canada
san carlos        spain
los angeles       united states
santa maria       united states
tokyo             japan
madrid            spain
santa maria       portugal
lisbon            portugal

我想在locs中創建兩個新變量,這些變量將cityV1任何字符串匹配相關聯,以便locs如下所示:

V1                                            city                  country                      
edmonton general hospital                     edmonton              canada
hospital san carlos, madrid spain             san carlos, madrid    spain
hospital of santa maria, lisbon, portugal     santa maria, lisbon   portugal, united states

需要注意的幾點: V1可能有多個國家/地區名稱。 另外,如果有一個重復的國家(例如,聖卡洛斯和馬德里都在西班牙),那么我只想要該國家的一個實例。

請指教。

謝謝。

使用tidyversestringr解決方案。 locs2是最終輸出。

library(tidyverse)
library(stringr)

locs2 <- locs %>%
  rowwise() %>%
  mutate(city = list(str_match(V1, cities$city))) %>%
  unnest() %>%
  drop_na(city) %>%
  left_join(cities, by = "city") %>%
  group_by(V1) %>%
  summarise_all(funs(toString(sort(unique(.)))))

結果

locs2 %>% as.data.frame()
                                                           V1                city                 country
1 cardiovascular institute, hospital san carlos, madrid spain  madrid, san carlos                   spain
2                                   edmonton general hospital            edmonton                  canada
3                   hospital of santa maria, lisbon, portugal lisbon, santa maria portugal, united states

數據

library(tidyverse)

locs <- data_frame(V1 = c("edmonton general hospital",
                   "cardiovascular institute, hospital san carlos, madrid spain",
                   "hospital of santa maria, lisbon, portugal"))

cities <- read.table(text = "city              country
edmonton          canada
'san carlos'        spain
'los angeles'       'united states'
'santa maria'       'united states'
tokyo             japan
madrid            spain
'santa maria'       portugal
lisbon            portugal",
                     header = TRUE, stringsAsFactors = FALSE)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM