简体   繁体   中英

Get a word's function in a sentence PY

my question is a bit tricky here, in fact i'm trying to identify the ROLE of a word in a given sentence, i manage to get something using nltk, the problem is that it's telling me what the word is, what i'm searching for is it's job. For example God Loves Apples would not return God as a subject in this given sentence. in fact here it would return God as a NNP, which is not what i'm looking for. So im looking for getting as the dict key the role of the given word in it's string (looking for god as subject not god as NNP)

import sys # Imports
import subprocess # Imports
subprocess.check_call([sys.executable, '-m', 'pip', 'install', 
'nltk','--quiet'],stderr=subprocess.DEVNULL) # Downloading NLTK
import nltk # Imports
n=input("Enter something\n") # User input
tokens = nltk.word_tokenize(n) # Formatting
tagged = nltk.pos_tag(tokens) # Formatting
dct = dict((y,x) for x, y in tagged) #tuple to dict
file = open('DATA.txt', 'a') # Creating new txt to store data
sys.stdout = file # Getting out of it
print(dct.get('NNP'),' :') #Getting and printing NNP if exists else print the sentence
print(dct) # Printing data
print("\n") #next line
file.close() # Closing it

You could use dependency parsing. NLTK is not ideal for this task, but there are alternatives like CoreNLP or SpaCy . Both can be tested online ( here and here ). The dependency tree will tell you that in God loves apples. , the token God is connected to the main verb with the nsubj relation, ie, nominal subject.

I usually go for SpaCy:

import spacy

nlp = spacy.load('en_core_web_sm')

# Process the document
doc = nlp('God loves apples.')

for tok in doc:
    print(tok, tok.dep_, sep='\t')

which results in

God nsubj
loves   ROOT
apples  dobj
.   punct

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM