Pandas string contains and replace

Question

I have the following dataframe

         A            B
0        France        United States of America
1        Italie        France
2        United Stats  Italy

I'm looking for a function that can take (for each word in column A ) the first 4 letters and then search in column B whether or not these 4 letters are there. Now if this is the case, I want to replace the value in A with the similar value (similar first 4 letters) in B .

Example : for the word Italie in column A , I have to take Ital then search in B whether or not we can find it. Then I want to replace Italie with its similar word Italy .

I've tried to do for with str.contains function

But still cannot take only the first 4 letters.

Output expected :

         A                         B
0        France                   United States of America
1        Italy                    France
2        United Stats of America  Italy

In order to summarize, I am looking for correcting values in column A to become similar to those in column b

Answer 1

Solution from fuzzy match -- fuzzywuzzy

from fuzzywuzzy import process

def fuzzyreturn(x):
    return [process.extract(x, df.B.values, limit=1)][0][0][0]


df.A.apply(fuzzyreturn)
Out[608]: 
0                      France
1                       Italy
2    United States of America
Name: A, dtype: object
df.A=df.A.apply(fuzzyreturn)

Pandas string contains and replace

Question

1 answers

solution1
1 ACCPTED 2018-12-27 22:50:27

Pandas string contains and replace

Question

1 answers

solution1 1 ACCPTED 2018-12-27 22:50:27

solution1
1 ACCPTED 2018-12-27 22:50:27