简体   繁体   English

如何在 python 中获取序列的上下文?

[英]How to context of a sequence in python?

Given a text such as给定一个文本,例如

The green ball on the table is mine桌上的绿球是我的

I want to extract the contexts of size 2 (to the right and left) of any sequence in the sentence above.我想提取上面句子中任何序列的大小为 2(左右)的上下文。

For example, given the sequence green ball , i expect to have "The", "on","the,"table" as the contexts.例如,给定序列green ball ,我希望将“The”、“on”、“the”、“table”作为上下文。

Also given The , I want to have "green", "ball" "on", "the" as the contexts.还给定The ,我希望将“green”、“ball”、“on”、“the”作为上下文。 Given given mine , I want to have "on", "the" "table", "is" as the contexts.给定的,我希望有“on”、“the”、“table”、“is”作为上下文。

Since The does not have contexts on the left.由于The在左侧没有上下文。 it picks the 4 contexts from the right, the same applies to mine .它从右侧选择 4 个上下文,这同样适用于的。

I tried out something but my method was based on splitting as shown below我尝试了一些东西,但我的方法是基于拆分,如下所示

query = "green ball"
context_window=2
texts="The green ball  on the table is mine"

tokens = texts.split()
index = tokens.index(query)
begin = max(0, index - context_window)
end = min(index + 1 + context_window, len(tokens))
context_words = tokens[begin:end]

I discovered it cannot work in this case.我发现它在这种情况下不起作用。 Any way out?有什么出路吗?

Split the input string into 2 parts: the part to the left of the query and the part to the right of it.将输入字符串分成两部分:查询左侧的部分和右侧的部分。

Then split the left and right parts into words.然后将左右部分拆分为单词。 Get the last context_window words of the left part and the first context_window words of the right part.获取左侧的最后一个context_window单词和右侧的第一个context_window单词。 If either part is shorter than context_window , add the difference to the number of words you get from the other part.如果任一部分比context_window短,则将差值添加到您从另一部分获得的单词数上。

Finally, get the last words from the left part and the first words from the right part.最后,得到左边的最后一个词和右边的第一个词。

query = "green ball"
context_window=2
texts="The green ball  on the table is mine"
left = texts[0:texts.find(query)]
right = texts[texts.find(query)+len(query):]

if left:
    left_tokens = left.split()
else:
    left_tokens = []

if right:
    right_tokens = right.split()
else:
    right_tokens = []

if len(left_tokens) >= context_window and len(right_tokens) >= context_window:
    left_context = context_window
    right_context = context_window
elif len(left_tokens) < context_window:
    left_context = len(left_tokens)
    right_context = context_window + (context_window - left_context)
else:
    right_context = len(right_tokens)
    left_context = context_window + (context_window - right_context)

context_words = left_tokens[-left_context:] + right_tokens[0:right_context]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM