简体   繁体   中英

How to compare comma separated string in one column with the comma separated strings other dataframe

I have two df as below df1:

    M1     |
    -------+
    a,b,c  |
    a      |
    b,c    |
    c,b,a  |
    b,a,d  |
    d,a,b,c|
    a,d,c  |
    b      |
    c,d    |
    d,a    |

df2:

X1      |X2
--------+---
a       |1
b       |2
c       |3
d       |4
a,b     |5
a,c     |6
a,d     |7
b,c     |8
b,d     |9
c,d     |10
a,b,c   |11
a,c,d   |12
a,b,d   |13
b,c,d   |14
a,b,c,d |15

can someone help me to match values in df1$M1 and df2$X1. and put the corresponding X2 value in column M2 as below

df1:

M1      |M2
--------+---
a,b,c   |11
a       |1
b,c     |8
c,b,a   |11
b,a,d   |13
d,a,b,c |15
a,d,c   |12
b       |2
c,d     |10
d,a     |7

Can someone help me

X1 and M1 have to be stored as Characters. You can check with str(df1) , and re-assign if necessary df1 <- as.character(df1$X1) , and the same for df2

Then, create new columns with the values in alphabetical order:

df1$Ordered <- sapply(lapply(strsplit(df1$X1, ","), sort),paste,collapse=",")

df2$Ordered <- sapply(lapply(strsplit(df2$M1, ","), sort),paste,collapse=",")

Then perform a join like so:

merge(df1, df2, by="Ordered")

If you want to include all the values in df1 regardless of whether they have a matching value in df2 , add the all.x = TRUE argument. Same logic applies adding all = TRUE (include everything from both data frames), or all.y = TRUE for df2 .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM