I have the following dataframe
A B
0 France United States of America
1 Italie France
2 United Stats Italy
I'm looking for a function that can take (for each word in column A
) the first 4 letters and then search in column B
whether or not these 4 letters are there. Now if this is the case, I want to replace the value in A with the similar value (similar first 4 letters) in B
.
Example : for the word Italie in column A
, I have to take Ital
then search in B
whether or not we can find it. Then I want to replace Italie
with its similar word Italy
.
I've tried to do for
with str.contains
function
But still cannot take only the first 4 letters.
Output expected :
A B
0 France United States of America
1 Italy France
2 United Stats of America Italy
In order to summarize, I am looking for correcting values in column A to become similar to those in column b
Solution from fuzzy match -- fuzzywuzzy
from fuzzywuzzy import process
def fuzzyreturn(x):
return [process.extract(x, df.B.values, limit=1)][0][0][0]
df.A.apply(fuzzyreturn)
Out[608]:
0 France
1 Italy
2 United States of America
Name: A, dtype: object
df.A=df.A.apply(fuzzyreturn)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.