Hi let's imagine i have a grammar like this S-> NNP VBZ NNP . However the number of NNPs are huge and its in a file. How can I load that directly into grammar or how can I make sure that the grammar fetches the words from the corpus instead of specifying all the words ?
Assuming each POS has its own text file consisting of every possible word with that tag on a separate line, you just want to make a dictionary by reading in the lines:
lexicon = {}
with open('path/to/the/files/NNP.txt', 'r') as NNP_File:
# 'with' automatically closes the file once you're done
# now update the 'NNP' key in your lexicon with every word in the file.
# a set seems like a good idea but it depends on your purposes
lexicon['NNP'] = set(NNP_File.readlines())
This setup is good for checking if some word can be of a specified part of speech; you could also flip it around and make the words the keys, if that's better for what you're building:
for word in NNP_File.readlines():
if lexicon.has_key(word):
lexicon[word].update(['NNP'])
else:
lexicon[word] = set(['NNP'])
If your text files are formatted a different way, you'll need to take a different approach. EDIT To produce a grammar line in the format you mentioned, you could follow that first approach above with something like,
with open('path/NNP.txt', 'r') as f:
NNP_terminal_rule = 'NNP -> ' + '|'.join(f)
# str.join() takes an iterable, so the file object works here.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.