[英]R replace string in df with partial match in a list
我在 R 中有一個 dataframe (df),我想創建一個新列 (city1_n),只要 city1 和鍵之間存在部分匹配,它就包含存儲在列表鍵中的一行。
下面我創建了一個小示例,應該有助於可視化我的問題。
> dput(df)
structure(list(Country = c("USA", "France", "Italy", "Spain",
"Mexico"), City1 = c("Los angeles", "Paris", "Rome", "Madrid",
"Cancun"), City2 = c("New York", "Lyon", "Pisa", "Barcelona",
"San Cristobal de las Casas")), class = "data.frame", row.names = c(NA,
-5L))
> dput(key)
list("Los angeles California", "Paris Île-de-France", "Rome Lazio",
"Madrid Comunidad de Madrid ", "Cancun Quintana Roo")
結果:
我希望通過 R 或 Unix 解決此問題。
使用fuzzyjoin::fuzzyjoin
:
fuzzyjoin::fuzzy_left_join(df, data.frame(key), by = c("City1" = "key"), match_fun = \(x,y) str_detect(y, x))
Country City1 City2 key
1 USA Los angeles New York Los angeles California
2 France Paris Lyon Paris Île-de-France
3 Italy Rome Pisa Rome Lazio
4 Spain Madrid Barcelona Madrid Comunidad de Madrid
5 Mexico Cancun San Cristobal de las Casas Cancun Quintana Roo
數據
df <- structure(list(Country = c("USA", "France", "Italy", "Spain",
"Mexico"), City1 = c("Los angeles", "Paris", "Rome", "Madrid",
"Cancun"), City2 = c("New York", "Lyon", "Pisa", "Barcelona",
"San Cristobal de las Casas")), class = "data.frame", row.names = c(NA,
-5L))
key <- c("Los angeles California", "Paris Île-de-France", "Rome Lazio",
"Madrid Comunidad de Madrid ", "Cancun Quintana Roo")
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.