简体   繁体   English

如何建立文字中的单词词典

[英]How to build a dictionary of words in text

How would I return a dictionary with the key being a word in the given text and the values being a list of previous words in the text? 如何返回字典,键是给定文本中的单词,值是文本中先前单词的列表?

eg 例如

text = "hi my name is"    
get_previous_words_dict(text):

prints a dictionary: 打印字典:

>>> my_dict['hi']
[]
>>> my_dict['my']
['hi']    
>>> my_dict['name']
['hi', 'my']

This only makes sense if the words in the sentence are unique, as @cjds points out. 如@cjds所指出的,这仅在句子中的单词唯一时才有意义。 Also, the value for the first word should surely be an empty list, not a list containing the empty string. 另外,第一个单词的值一定是空列表,而不是包含空字符串的列表。 The following will fit this specification: 以下将符合此规范:

def get_previous_words_dict(text):
    words = []
    dictionary = {}
    for word in text.split():
        dictionary[word] = words[:]
        words.append(word)
    return dictionary

The most important thing to understand is the assignment: 要了解的最重要的事情是作业:

dictionary[word] = words[:]

The effect of this is to copy the words array. 这样的效果是复制单词数组。 If it was a normal assignment: 如果是正常分配:

dictionary[word] = words

Then that would just make each dictionary entry refer to the same words list, and so at the end of the loop every entry in the dictionary would have all of the words. 然后,这将使每个词典条目都引用相同的words列表,因此在循环结束时,词典中的每个条目都将具有所有单词。

>>> t="hi my name is"
>>> li=t.split()

You can use a dict comprehension: 您可以使用dict理解:

>>> {w:[li[si] for si in range(i-1,-1,-1)] for i, w in enumerate(li)}
{'is': ['name', 'my', 'hi'], 'hi': [], 'my': ['hi'], 'name': ['my', 'hi']}

Or, counting up: 或者,向上计数:

>>> {w:[li[si] for si in range(0,i)] for i, w in enumerate(li)}
{'is': ['hi', 'my', 'name'], 'hi': [], 'my': ['hi'], 'name': ['hi', 'my']}

Or use a slice instead of the nested list comprehension: 或者使用切片而不是嵌套列表推导:

>>> {w:li[0:i] for i, w in enumerate(li)}
{'is': ['hi', 'my', 'name'], 'hi': [], 'my': ['hi'], 'name': ['hi', 'my']}

If I were to implement from scratch: 如果我要从头开始实施:

Use a hash to store words, this used as dictionary. 使用散列来存储单词,该单词用作字典。 When insert into hash, insert as key => [previous keys in hash]. 当插入到哈希中时,插入为键=> [哈希中的先前键]。

  1. Split the sentence into words: 将句子拆分为单词:

     sentence_words = sentence.split(' ') 
  2. Create a dictionary where the key is the word, and the value is a slice of sentence_words from the beginning to the position of this word. 创建一个字典,其中关键字是单词,值是从该单词的开头到位置的sentence_words _单词的一部分。

     d = {w: sentence_words[:i] for i, w in enumerate(sentence_words)} 

Sample Code 样例代码

sentence = "Hi my name is John"
sentence_words = sentence.split(' ')
d = {w: sentence_words[:i] for i, w in enumerate(sentence_words)}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM