简体   繁体   中英

Way to automatically determine word context/meaning in Python

I have a Python list of words about different topics (specifically subreddit names). I need to filter out only subreddits about programming, like "linux", "Python", "lisp", "programming", "ProgrammerHumor" etc. (notice the lack of consistency with capital letters). Is there any automatic way to do it, eg with NLTK, or do I have to do it by hand?

EDIT:
Apparently my initial question was a bit unclear. So I have a long list of possible subreddit names and I want to filter out only those that have to do with programming. I don't know which ones are those - there may be programming languages names ("lisp", "Python"), or general programming stuff ("programming", "ProgrammerHumor"), or anything programming-related ("LearningMachineLearning, "linux"). Is there any way to extract those ones automatically, possibly with NLP, based on a word meaning/context, or do I have to do it by hand?

I'm not sure I understand what you're trying to achieve. But if the question is whether you can match words that are contaminated with capital letters, you should just convert everything to lowercase and then do a regex match.

In your case this might look something like:

Step 1: install regex:

pip install regex

Step 2: write the code:


import regex as re
list_of_stuff = ['Python', 'linux', 'lisp', 'programming', 'ProgrammerHumor']
thing_to_find = 'program'
mult3 = map(lambda x: x.find(thing_to_find), list(map(str.lower, list_of_stuff)))
print (list(mult3)) # [-1, -1, -1, 0, 0] -1 = no match, 0 = match

In the above, the output is:

[-1, -1, -1, 0, 0] 

In which -1 indicates an index of no match, and 0 indicates a match.

Use a similarity method like spacy() to check this

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM