简体   繁体   中英

How to check if a part of a string is among the keys of the dictionnary

I have defined my dic as follows:

grocery_dict={"apple":"fruite", "pepper": "veg", "spaghetthi":"pasta", "banana":"fruite", "tomato":"fruite"}

and my list is grocery_list=["apple","bananas","pizza","pepper"] I have written a code that allows me to compare the items and delivers the category of the item.

gl=[] 
for item in grocery_list:
    if item in grocery_dict:
        x=grocery_dict[item]
        gl.append(x)
    else:
        x='other'
        gl.append(x)
print(gl)

Next i can caluclate how many times i have each category. Now my issue is how to compare it a part of a word exists in the dictionnary for example if i have items such as "Mexican Pepper" or "tomatto" and how to not consider capital letters in a string.

Another question: Is it possible to use pyspark for such cases?

Thank you in advance

This has actually very few to do with dicts and mostly to do with string manipulations and natural language processing.

wrt/ capitalisation and upper/lower case, the solution is simple: only use all lower strings as keys in your dict and apply the .lower() method to all strings in your list, ie:

grocery_list = ["apple","bananas","pizza","pepper"]
normalized_list = [word.lower() for word in grocery_list]

Handling terms like "Mexican Pepper" will be harder. You can of course split the string and look for each part, but if you something like "Apple Tomato" in your list then there's no way to tell whether you want "apple" or "tomato". And handling spelling mistakes will require something like a spellchecker, but here again you can't be sure you'll get a failsafe unambiguous answer.

As a side note: your current code can be vastly simplified:

gl = [grocery_dict.get(name, "other") for name in grocery_list]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM