简体   繁体   中英

create a new column depend on matching string in columns in two different dataframes

I have tow data frames A and B, and I want to match between names columns in tow data frames if the name is existing in data set BI need to create a new column in data set A with the Id of data set B if not existing return 0

here is the code I wrote

#data B
    email              name        id
    hi@amal.com       amal call     6
    hi@hotmail.com      amal        6
    hi@gmail.com        AMAL boy    6
    hi@boy.com          boy         7
    hi@hotmail.com      boy         7
    hi@call.com     call AMAL       9
    hi@hotmail.com      boy         7
    hi@dog.com          dog         8
    hi@outlook.com      dog         8
    hi@gmail.com        dog         8



#data A

    id  name
    1   amal
    1   AMAL
    2   call
    4   dog
    3   boy

first I create contains function

A.name.str.contains('|'.join(B.name))

then I tried to create a column

A["new"] = np.where(A.name.str.contains('|'.join(B.name))==True, B.id, 0)

but I get this error

ValueError: operands could not be broadcast together with shapes (5,) (10,) ()

what I expected is

    id  name  new
    1   amal  6
    1   AMAL  0
    2   call  0
    4   dog   7
    3   boy   8

any help?

Use Series.map by Series with removed duplicated rows by DataFrame.drop_duplicates , then replace missing values by Series.fillna and convert to integers:

A["new"] = A.name.map(B.drop_duplicates('name').set_index('name')['id']).fillna(0).astype(int)
print (A)
   id  name  new
0   1  amal    6
1   1  AMAL    0
2   2  call    0
3   4   dog    8
4   3   boy    7

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM