NLP nltk使用自定义语法

Question

嗨，让我们想象一下我有一个语法，例如S-> NNP VBZ NNP。 但是，NNP的数量巨大，并且存在文件中。 如何将其直接加载到语法中，或者如何确保语法从语料库中获取单词，而不是指定所有单词？

Answer 1

假设每个POS都有自己的文本文件，该文本文件由每个可能的单词组成，并在单独的行上带有该标签，您只想通过阅读以下行来制作字典：

lexicon = {}
with open('path/to/the/files/NNP.txt', 'r') as NNP_File: 
    # 'with' automatically closes the file once you're done
    # now update the 'NNP' key in your lexicon with every word in the file.
    # a set seems like a good idea but it depends on your purposes
    lexicon['NNP'] = set(NNP_File.readlines())

此设置非常适合检查某些单词是否可以属于语音的指定部分； 您也可以将其翻转，然后将单词作为键，如果这样对您正在构建的内容更好：

for word in NNP_File.readlines():
    if lexicon.has_key(word):
        lexicon[word].update(['NNP'])
    else:
        lexicon[word] = set(['NNP'])

如果文本文件的格式不同，则需要采取其他方法。 编辑要以您提到的格式产生语法行，您可以按照上述第一种方法进行操作，

with open('path/NNP.txt', 'r') as f:
    NNP_terminal_rule = 'NNP -> ' + '|'.join(f) 
    # str.join() takes an iterable, so the file object works here.

NLP nltk使用自定义语法

问题描述

1 个解决方案

解决方案1
1 2016-07-07 17:27:55

NLP nltk使用自定义语法

问题描述

1 个解决方案

解决方案1 1 2016-07-07 17:27:55

解决方案1
1 2016-07-07 17:27:55