简体   繁体   English

如何在字典中连接文本的连续单词?

[英]How to connect consecutive words of a text in a dictionary?

I have to connect consecutive words of a text in a dictionary. 我必须在词典中连接文本的连续单词。

The text is: 文本是:

text = "Hello world I am Josh"

The dictionary would be: 字典将是:

dict = {Hello:[world], world:[Hello, I], I:[am, world], am:[I, Josh], Josh:[am]} 

The keys are all the words in the text, the values are the consecutive words. 键是文本中的所有单词,值是连续的单词。 Anyone has an idea to abstain this? 有人有弃权的想法吗?

  1. I would split the text. 我将拆分文本。 To obtain all the words in a list. 获取列表中的所有单词。
  2. I would use the words as the keys of the dictionary. 我会用单词作为字典的键。
  3. ?

Using the pairwise recipe from itertools : 使用itertoolspairwise配方

def pairwise(iterable):
    a, b = tee(iterable)
    next(b, None)
    return izip(a, b)

adjacent = collections.defaultdict(list)
for left, right in pairwise(text.split()):
    adjacent[right].append(left)
    adjacent[left].append(right)

Your question doesn't consider the possibility that a word appears in the sentence more than once. 您的问题没有考虑单词在句子中出现多次的可能性。 You might want a set rather than a list of adjacent words. 您可能需要一set而不是相邻单词的list Punctuation in the sentence could also ruin your day, so depending on your requirements you might need to do more than just split() . 句子中的标点符号也可能会破坏您的一天,因此根据您的要求,您可能需要做的不只是split()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM