简体   繁体   中英

Match and Group similar words that are related to each other (relevant) in a list

It is not just grouping the words in similarities but also meaning. Say that I have the following list:

func = ['Police Man','Police Officer','Police','Admin','Administrator','Dietitian','Food specialist','Administrative Assistant','Economist','Economy Consultant']

I want to find words with similar meaning and function. I tried fuzzywuzzy but it does not achieve what I want:

for i in func:
    for n in func:
        print(i,":",n)
        print(fuzz.ratio(i, n))

This is part of the fuzzing and it does not do the job:

Dietitian : Dietitian
100
Dietitian : Food specialist
25

I believe I should use library nltk or stemming? What is the best approach to find relevant words and functions in a list?

I believe I should use... stemming?

You definitely don't want to use stemming. Stemming will only take words to their roots, so stem("running") = "run". It doesn't do anything based on meaning , so stem("sprinting") = "sprint".= "run": :(

I believe I should use nltk...

WordNet will let you search for sets of synonyms called "synsets" and you can access it through nltk or even through a web interface. It's not great at compound words, though. :( It's mostly just individual words.

So, you can look up "officer" and "policeman" and see that they have an overlapping meaning. Of course, "officer" also has OTHER meanings; how close do words have to be to qualify for your search? Eg if "Food Specialist" is the same as "Dietician", is "Food Specialist" also the same as "Chef"?

If WordNet does seem like a useful tool, check out their Python API. You'd want something like

common = [synset for synset in wn.synsets("officer") if synset in wn.synsets("policeman")
return len(common) > 0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM