简体   繁体   中英

Pandas string contains and replace

I have the following dataframe

         A            B
0        France        United States of America
1        Italie        France
2        United Stats  Italy

I'm looking for a function that can take (for each word in column A ) the first 4 letters and then search in column B whether or not these 4 letters are there. Now if this is the case, I want to replace the value in A with the similar value (similar first 4 letters) in B .

Example : for the word Italie in column A , I have to take Ital then search in B whether or not we can find it. Then I want to replace Italie with its similar word Italy .

I've tried to do for with str.contains function

But still cannot take only the first 4 letters.

Output expected :

         A                         B
0        France                   United States of America
1        Italy                    France
2        United Stats of America  Italy

In order to summarize, I am looking for correcting values in column A to become similar to those in column b

Solution from fuzzy match -- fuzzywuzzy

from fuzzywuzzy import process

def fuzzyreturn(x):
    return [process.extract(x, df.B.values, limit=1)][0][0][0]


df.A.apply(fuzzyreturn)
Out[608]: 
0                      France
1                       Italy
2    United States of America
Name: A, dtype: object
df.A=df.A.apply(fuzzyreturn)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM