For instance i have a row value on the dataset_1: "Entity" = Apple
dataset_2: "Entity" = iCloud Apple
(Entity is the column) I need to merge one dataset to the other by the column entity, but to do that i need them to have exacly the same value and Apple ≠ iCloud Apple.
Both datasets are huge so i cant do this manually, one by one.
Code:
`
# preparing data
dataset_1 = {"Entity": 'Prudential Insurance Company of America - Unisys', 'Bank': 'America'}
dataset_2 = {"Entity": 'Unisys', 'Bank': 'Africkan', 'code': '70000-000'}
ds_array = [dataset_1, dataset_2]
# end of preparing data
for d1 in ds_array[0:len(ds_array) - 1]:
n1 = d1['Entity'].split()
n1 = {x for x in n1 if len(x) >= 5} # discards words with less than 5 letters
for d2 in ds_array[1:len(ds_array)]:
n2 = d2['Entity'].split()
n2 = {x for x in n2 if len(x) >= 5}
merge = n1 & n2 # only words in both sets: n1 and n2
if len(merge) > 0: # tests if there is at least 1 word
d1['Entity'] = ' '.join(merge)
d2['Entity'] = d1['Entity']
print(ds_array)
`
Output: [{'Entity': 'Unisys', 'Bank': 'America'}, {'Entity': 'Unisys', 'Bank': 'Africkan', 'code': '70000-000'}]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.