[英]How to do a match in R with left_join using multiple columns and one “likely” column
我試圖在 R 中的兩個不同數據幀之間進行匹配。
例如,一個數據框如下所示:
df1<- data.frame(description=c("sol 100ml","200 mg","1.5 ml","10MG"),
pa=c("clorbetazol","Milk","Aciclovir","AAC"),
atc=c("x1","a2","a3","x3"))
description pa atc
sol 100ml clorbetazol x1
200 mg Milk a2
1.5 ml Aciclovir a3
10MG AAC x3
另一個看起來像:
df2 <-data.frame(Concentration=c("100","200","1.5","10"),
pa=c("clorbetazol","Milk","Aciclovir","AAC"),
atc=c("x1","a2","a3","x3"),
code=c("A101","A202","A303","A404"))
Concentration pa atc code
100 clorbetazol x1 A101
200 Milk a2 A202
1.5 Aciclovir a3 A303
10 AAC x3 A404
我的問題是:有一種方法可以與列“pa”、“atc”進行匹配,並以某種方式(使用 GREPL 或其他方式)使用“concentration”列來進行左連接或合並?
最后我想得到這個:
description pa atc code
sol 100ml clorbetazol x1 A101
200 mg Milk a2 A202
1.5 ml Aciclovir a3 A303
10MG AAC x3 A404
我想知道是否有人可以幫助我。
謝謝!
您可以使用正則表達式來提取數字,然后將其與左連接匹配:
library(dplyr)
df1 %>%
mutate(Concentration = gsub("^.*?(\\d+(\\.)?(\\d+)?).*$", "\\1", description)) %>%
left_join(df2, by = c("pa", "atc", "Concentration")) %>%
select(-Concentration)
#> description pa atc code
#> 1 sol 100ml clorbetazol x1 A101
#> 2 200 mg Milk a2 A202
#> 3 1.5 ml Aciclovir a3 A303
#> 4 10MG AAC x3 A404
將gsub
與正則表達式一起使用,然后merge
。
res <- merge(transform(df1, Concentration=gsub("[^\\d\\.]", "",
description, perl=TRUE)),
df2, all=TRUE)[-3]
res
# pa atc description code
# 1 AAC x3 10MG A404
# 2 Aciclovir a3 1.5 ml A303
# 3 clorbetazol x1 sol 100ml A101
# 4 Milk a2 200 mg A202
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.