简体   繁体   English

在句子中创建单词及其上下文的字典

[英]Creating a dictionary of words and their context in a sentence

I have a Python list containing hundreds of thousands of words. 我有一个包含数十万个单词的Python列表。 The words appear in the order they are in the text. 单词按文本中的顺序显示。

I'm looking to create a dictionary of each word associated with a string containing that word with 2 (say) words that appear before and after it. 我希望创建一个与包含该单词的字符串相关联的每个单词的字典,其中包含出现在其前后的2个(例如)单词。

For example the list: "This" "is" "an" "example" "sentence" 例如列表:“这”“是”“一个”“例子”“句子”

Should become the dictionary: 应该成为字典:

"This" = "This is an"
"is" = "This is an example"
"an" = "This is an example sentence"
"example" = "is an example sentence"
"sentence" = "an example sentence"

Something like: 就像是:

WordsInContext = Dict()
ContextSize = 2
wIndex = 0
for w in Words:
    WordsInContext.update(w = ' '.join(Words[wIndex-ContextSize:wIndex+ContextSize]))
    wIndex = wIndex + 1

This may contain a few syntax errors, but even if those were corrected, I'm sure it would be a hideously inefficient way of doing this. 这可能包含一些语法错误,但即使这些错误已得到纠正,我也相信这将是一种非常低效的方法。

Can someone suggest a more optimized method please? 有人可以提出更优化的方法吗?

My suggestion: 我的建议:

words = ["This", "is", "an", "example", "sentence" ]

dict = {}

// insert 2 items at front/back to avoid
// additional conditions in the for loop
words.insert(0, None)
words.insert(0, None)
words.append(None)
words.append(None)

for i in range(len(words)-4):   
    dict[ words[i+2] ] = [w for w in words[i:i+5] if w]
>>> from itertools import count
>>> words = ["This", "is", "an", "example", "sentence" ]
>>> context_size = 2
>>> dict((word,words[max(i-context_size,0):j]) for word,i,j in zip(words,count(0),count(context_size+1)))
{'This': ['This', 'is', 'an'], 'is': ['This', 'is', 'an', 'example'], 'sentence': ['an', 'example', 'sentence'], 'example': ['is', 'an', 'example', 'sentence'], 'an': ['This', 'is', 'an', 'example', 'sentence']}

In python 2.7+ or 3.x 在python 2.7+3.x

{word:words[max(i-context_size,0):j] for word,i,j in zip(words,count(0),count(context_size+1))}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 将句子起始词添加到字典中 - Adding sentence starter words into the dictionary 通过创建字典来替换单词 - Replacing words by creating a dictionary 创建文字中的单词的字典 - Creating a dictionary of the words in text 使用字典替换DataFrame中句子中的单词 - using dictionary to replace words in sentence in DataFrame 创建一个接受单词列表的 function,并返回句子中的一组单词 - Creating a function that takes a list of words, and return a set of words in the sentence 创建一个函数以返回句子中所有大写的单词(不包括逗号) - Creating a function that returns all capitalized words in a sentence (commas excluded) 创建包含英语单词的词典 - Creating a dictionary which contains English words 以句子的单词为键,从 1 开始的单词的 position 的个数作为 python 中的值构建字典 - Build a dictionary with the words of a sentence as keys and the number of position of the words from 1 as values in python 创建一个字典,其中键是一个整数,值是随机句子的长度 - Creating a dictionary where the key is an integer and the value is the length of a random sentence 创建字符串中的单词字典,其值是该单词之后的单词 - creating a dictionary of words in string whose values are words following that word
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM