简体   繁体   中英

Python remove common substrings from two lists

I have two lists: list_a and list_b. I want to remove not only the common strings in them, but also, the common (longest) substrings in them. The length of the lists could be different. For eg, some combinations of input are:

list_a = ['mens', 'room']
list_b = ['mensworld']
Expected output: 
list_a_out: ['room']
list_b_out: ['world']

list_a = ['flower']
list_b = ['mayflower', 'June']
Expected output:
list_a_out: []
list_b_out: ['may', 'June]

list_a = ['Chi', 'Construction']
list_b = ['Dex', 'Construction']
expected output:
list_a_out: ['Chi']
list_b_out: ['Dex']

So far, I have a developed a code that partially works:

def remove_common_substring_1(list_a, list_b):

final_list_a = []
final_list_b = []

flag_i_in_j = False
flag_j_in_i = False

for i in list_a:
    print("\n")
    for j in list_b:
        print(f'******* Processing {i, j} *******')
        if i in j:
            flag_i_in_j = True
            # remove i from j
            print(f'{i} is present {j}')
            print(f'Removing {i} from List A')
            final_list_a.append(list_a.remove(i))

            print(f'Removing substring {i} from {j} ')
            j_new = j.replace(i, '')
            final_list_b.append(j_new)

        elif j in i:
            flag_j_in_i = True
            print(f'{j} is present in {i}')
            print(f'Removing {j} from List B')
            final_list_b.append(list_b.remove(j))

            print(f'Removing substring {j} from {i}')
            i_new = i.replace(j, '')
            final_list_a.append(i_new)
        else:
            continue

    if not flag_i_in_j and not flag_j_in_i:
        final_list_a.append(i)
        #final_list_b.append(j)

final_list_a = list(filter(None, final_list_a))
final_list_b = list(filter(None, final_list_b))
return final_list_a, final_list_b

The above code works for one class of input:

list_a : ['mens', 'group']
list_b: ['cgkgroup']

output:
final_list_a: ['mens']
final_list_b: ['cgk']

I continuing to proceed step by and step and see if I can catch erroneous logic or see if I can do it in a completely different way. Any suggestions is appreciated. Thanks.

here is my attempt

def removeCommon(list, s):
    result = []
    found = False

    for b in list:
        idx = b.find(s)

        if idx != -1:
            stripped = b[0:idx]+ b[idx + len(s):]
            if len(stripped) > 0:
                result.append(stripped)
            found = True
        else:
            result.append(b)

    return (found, result)
           

def remove_common_substring_1(list_a, list_b):
    list_a_out = []
    list_b_out = list(filter(lambda a: True, list_b))

    for a in list_a:
        found, list_b_out = removeCommon(list_b_out, a)
        if not found:
            list_a_out.append(a)

    return (list_a_out, list_b_out)        

print(remove_common_substring_1(['mens', 'room'], ['mensworld']))
print(remove_common_substring_1(['flower'], ['mayflower', 'June']))
print(remove_common_substring_1(['Chi', 'Construction'], ['Dex', 'Construction']))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM