简体   繁体   中英

Replacing a column in a dataframe with another dataframe column using partial string match

I have the large CSVs with following sample dataframes:

df1 = 
Index    Fruit   Vegetable    
    0    Mango   Spinach
    1    Berry   Carrot
    2    Banana  Cabbage   
df2 = 
Index   Unit                   Price
   0    Mango_123              30
   1    234_Artichoke_CE       45
   2    23_Banana              12
   3    Berry___LE             10
   4    Cabbage___12LW         25
   5    Rice_ww_12             40
   6    Spinach_KJ             34
   7    234_Carrot_23          08
   8    10000_Lentil           12
   9    Pot________12          32

I would like to replace the names in df2 to replace the names in df1 to create the following dataframe:

df3= 
Index    Fruit        Vegetable    
    0    Mango_123    Spinach_KJ
    1    Berry___LE   234_Carrot_23
    2    23_Banana    Cabbage___12LW

What would be a generic way to do this? Thank you.

You can use fuzzy matching with thefuzz.process.extractOne , that will compute the closest match using Levenshtein Distance :

# pip install thefuzz

from thefuzz import process

cols = ['Fruit', 'Vegetable']
df1[cols] = df1[cols].applymap(lambda x: process.extractOne(x, df2['Unit'])[0])

output:

   Index       Fruit       Vegetable
0      0   Mango_123      Spinach_KJ
1      1  Berry___LE   234_Carrot_23
2      2   23_Banana  Cabbage___12LW

Your problem will be better solved by using list comprehension:

fruit_list = [df2.Unit[df2.Unit.str.contains(x)].values[0] for x in df1.Fruit.tolist()]
vegetable_list = [df2.Unit[df2.Unit.str.contains(x)].values[0] for x in df1.Vegetable.tolist()]

Above code will create two lists, one will extract all the fruits from df2 while other will do the same for vegetables. Then, create a new df and do the following:

df3 = pd.DataFrame(columns=["Fruit", "Vegetable"])
df3["Fruit"] = fruit_list
df3["Vegetable"] = vegetable_list

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM