简体   繁体   中英

Check for substrings of list items python

Say I have a list:

    list = ['Apple', 'apple cider', 'apple juice', 'Mango', 'Mangosteen', 'Banana']

How do i detect if a list item is a substring of other list items, and then delete those other list items. The list should now look like this:

  list = ['Apple', 'Mango', 'Banana']

I need to get only the most basic version of a string in the list.

A few things. First, you shouldn't use list as a variable name since it's a keyword. Also, I used the lower() when comparing since the string's case don't appear to be relevant.

l = ['Apple', 'apple cider', 'apple juice', 'Mango', 'Mangosteen']
basic_items = []  # To save the basic strings (i.e. 'Apple', 'Mango')
for list_item in l:  # Loop through all the items
    item_is_basic = True  # True if the item is basic (which we assume beforehand)
    for item in basic_items:  # Loop through the basic items we already found
        if list_item.lower() in item.lower():
            # If the list item is contained in a basic item, it means the list item is "more basic"
            basic_items.remove(item)  # We remove the item which is not considered basic anymore
            break  # We stop the loop through the basic items
        if item.lower() in list_item.lower():
            # If the list item contains a basic item, it means the list item is NOT basic
            item_is_basic = False
            break  # We stop the loop through the basic items

    if item_is_basic:
        # Finally, if the item is considered basic, we add it to basic_items
        basic_items.append(list_item)

print(basic_items)  # outputs ['Apple', 'Mango']

By the end you have your basic items in a separate list, which you can use.

Actually finding substrings is a well-known topic you can find easily on SO. I'll concentrate on the part where you want to end up with a unique list of core ingredients. The below will first sort the items based on their length, thereby increasing the chance that the basic building blocks are found in front of the list.

Making the basic_items into a set is probably superfluous, but it at least guarantees unique representations.

listt = ['Apple', 'apple cider', 'apple juice', 'Mango', 'Mangosteen']

listt = sorted(listt, key=len)

basic_items = set()

for val in listt:
    if not any([val.lower().find(x.lower()) != -1 for x in basic_items]):
        basic_items.add(val)

listt = list(basic_items)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM