如何使用 NLTK 从句子中获取单词对？

Question

I want to take in a sentence:我想总结一句话：

sentence = "How many people are here"?

and return a list of phrases:并返回一个短语列表：

pairs = ["How many", "many people", "people are", "are here"]

I tried我试过

   tokens = nltk.word_tokenize(sentence)
   pairs = nltk.bigrams(tokens)

and instead got <generator object bigrams at 0x103697820>取而代之的是<generator object bigrams at 0x103697820>

Im pretty new to nltk so sorry that this is so off :) Help appreciated!我对 nltk 很陌生，很抱歉，这太糟糕了:) 帮助表示感谢！

Answer 1

As you mentioned, the nktk.bigrams() function returns a generator object.正如您提到的， nktk.bigrams()函数返回一个生成器对象。 Generators need to be iterated through in order to get the values out.生成器需要迭代以获取值。 This can be done with list() , or by looping over the generator.这可以通过list()或通过循环生成器来完成。

Below, I'm looping/iterating over the generator object (results of nktk.bigrams() ) in a list comprehension, while at the same time using "".join() to combine the pair (list) of words, shed by the generator, into a single string, as desired.下面，我在列表理解中循环/迭代生成器对象（ nktk.bigrams()结果），同时使用"".join()组合单词对（列表），由根据需要将生成器转换为单个字符串。

tokens = nltk.word_tokenize(sentence)
pairs = [ " ".join(pair) for pair in nltk.bigrams(tokens)]

['How many', ...] ['多少'， ...]

Answer 2

This should solve your problem:这应该可以解决您的问题：

import re
f = open('D:\Jupyter notebook\SNPQ.txt','r')
text = f.read()
text = re.sub('^\n|\n$','',(text))
for no,line in enumerate(text.splitlines()):
    print('"'+'","'.join([i.replace('"','\\"').strip() for i in re.split('(?<=^[0-9]{2})([0-9]{13}| {13})|  +',text.splitlines()[no].strip()) if i != None])+'"')

Thank you :)谢谢：）

如何使用 NLTK 从句子中获取单词对？

问题描述

2 个解决方案

解决方案1
6 已采纳 2014-09-17 02:51:29

解决方案2
0 2020-12-16 09:40:00

如何使用 NLTK 从句子中获取单词对？

问题描述

2 个解决方案

解决方案1 6 已采纳 2014-09-17 02:51:29

解决方案2 0 2020-12-16 09:40:00

解决方案1
6 已采纳 2014-09-17 02:51:29

解决方案2
0 2020-12-16 09:40:00