簡體   English   中英

如何使用多列和一個“可能”列在 R 中使用 left_join 進行匹配

[英]How to do a match in R with left_join using multiple columns and one “likely” column

我試圖在 R 中的兩個不同數據幀之間進行匹配。

例如,一個數據框如下所示:

df1<- data.frame(description=c("sol 100ml","200 mg","1.5 ml","10MG"),
                 pa=c("clorbetazol","Milk","Aciclovir","AAC"),
                 atc=c("x1","a2","a3","x3"))

 description          pa atc
   sol 100ml clorbetazol  x1
      200 mg        Milk  a2
      1.5 ml   Aciclovir  a3
        10MG         AAC  x3

另一個看起來像:

df2 <-data.frame(Concentration=c("100","200","1.5","10"),
                 pa=c("clorbetazol","Milk","Aciclovir","AAC"),
                 atc=c("x1","a2","a3","x3"),
                 code=c("A101","A202","A303","A404"))

  Concentration          pa atc code
            100 clorbetazol  x1 A101
            200        Milk  a2 A202
            1.5   Aciclovir  a3 A303
             10         AAC  x3 A404

我的問題是:有一種方法可以與列“pa”、“atc”進行匹配,並以某種方式(使用 GREPL 或其他方式)使用“concentration”列來進行左連接或合並?

最后我想得到這個:

 description          pa atc  code
   sol 100ml clorbetazol  x1  A101
      200 mg        Milk  a2  A202
      1.5 ml   Aciclovir  a3  A303
        10MG         AAC  x3  A404

我想知道是否有人可以幫助我。

謝謝!

您可以使用正則表達式來提取數字,然后將其與左連接匹配:

library(dplyr)

df1 %>% 
  mutate(Concentration = gsub("^.*?(\\d+(\\.)?(\\d+)?).*$", "\\1", description)) %>%
  left_join(df2, by = c("pa", "atc", "Concentration")) %>%
  select(-Concentration)
#>   description          pa atc code
#> 1   sol 100ml clorbetazol  x1 A101
#> 2      200 mg        Milk  a2 A202
#> 3      1.5 ml   Aciclovir  a3 A303
#> 4        10MG         AAC  x3 A404

gsub與正則表達式一起使用,然后merge

res <- merge(transform(df1, Concentration=gsub("[^\\d\\.]", "",
                                               description, perl=TRUE)),
      df2, all=TRUE)[-3]
res
#            pa atc description code
# 1         AAC  x3        10MG A404
# 2   Aciclovir  a3      1.5 ml A303
# 3 clorbetazol  x1   sol 100ml A101
# 4        Milk  a2      200 mg A202

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM