How can i merge two datasets with similar words in python?

Question

For instance i have a row value on the dataset_1: "Entity" = Apple

dataset_2: "Entity" = iCloud Apple

(Entity is the column) I need to merge one dataset to the other by the column entity, but to do that i need them to have exacly the same value and Apple ≠ iCloud Apple.

Both datasets are huge so i cant do this manually, one by one.

dataset_1

dataset_2

Answer 1

Code:

`
# preparing data
dataset_1 = {"Entity": 'Prudential Insurance Company of America - Unisys', 'Bank': 'America'}
dataset_2 = {"Entity": 'Unisys', 'Bank': 'Africkan', 'code': '70000-000'}
ds_array = [dataset_1, dataset_2]
# end of preparing data

for d1 in ds_array[0:len(ds_array) - 1]:
    n1 = d1['Entity'].split()
    n1 = {x for x in n1 if len(x) >= 5} # discards words with less than 5 letters
    for d2 in ds_array[1:len(ds_array)]:
        n2 = d2['Entity'].split()
        n2 = {x for x in n2 if len(x) >= 5}
        merge = n1 & n2 # only words in both sets: n1 and n2
        if len(merge) > 0: # tests if there is at least 1 word
            d1['Entity'] = ' '.join(merge)
            d2['Entity'] = d1['Entity']
print(ds_array)
`

Output: [{'Entity': 'Unisys', 'Bank': 'America'}, {'Entity': 'Unisys', 'Bank': 'Africkan', 'code': '70000-000'}]

How can i merge two datasets with similar words in python?

Question

1 answers

solution1
0 2022-04-26 00:14:11

How can i merge two datasets with similar words in python?

Question

1 answers

solution1 0 2022-04-26 00:14:11

solution1
0 2022-04-26 00:14:11