在字符串中的字符编号之前获取 n 个单词而不使用剪切词

Question

I am given a string and a character position in that string.我在该字符串中得到一个字符串和一个字符 position。 I want to get the n words before that position in a way that it does not include the last word if the character positon is in the middle of a word我想在 position 之前获取 n 个单词，如果字符位置在单词的中间，它不包括最后一个单词

text = 'the house is big the house is big the house is big'
char_nr = 19
list_of_words_before = text[:char_nr-1].split()
print(list_of_words_before) # we see that the string is splited in "the" I dont want hence the t in the list
nr_words = 3
if nr_words >len(list_of_words_before):
    nr_words = len(list_of_words_before)
    
list_of_words_before[-nr_words:]

this gives:这给出了：

['the', 'house', 'is', 'big', 't']
['is', 'big', 't']

but actually what I really want is ['house', 'is','big'] since t is just part of a word.但实际上我真正想要的是 ['house', 'is','big'] 因为 t 只是一个词的一部分。

How would you make sure in the first place to divide by a space between words?你如何确保首先用单词之间的空格来划分？ Is any other solution?还有其他解决方案吗？

Answer 1

Using regex:使用正则表达式：

>>> import re
>>> text = 'the house is big the house is big the house is big'
>>> result = re.match(r".{0,18}\b", text).group(0).split()
>>> result
['the', 'house', 'is', 'big']
>>> result[-3:]
['house', 'is', 'big']

Explanation:解释：

. any character任何字符
{0,18} match the preceding ( . ) 0 to 18 times, as many as possible {0,18}匹配前面的 ( . ) 0 到 18 次，尽可能多
\b the match ends in a beginning or ending of a word, so we don't get partial words \b匹配以单词的开头或结尾结束，所以我们不会得到部分单词

Answer 2

Maybe something like this:也许是这样的：

text = 'the house is big the house is big the house is big'
char_nr = 19
list_of_words_before = text[:char_nr - 1]
splitted = list_of_words_before.split()

if list_of_words_before[-1] != ' ':
    splitted = splitted[:-1]

nr_words = 3
print(splitted[-nr_words:])

Output: Output：

['house', 'is', 'big']

Answer 3

You can check the character at char_nr and if it's a non-word character then the splitting was correct, otherwise you need to remove the last item from the list.您可以在char_nr检查字符，如果它是非单词字符，则拆分是正确的，否则您需要从列表中删除最后一项。 Assuming that " " is the only character that can occur between words:假设" "是单词之间唯一可以出现的字符：

if text[char_nr] != " ":
    list_of_words_before = list_of_words_before[:-1]

Answer 4

I think this is what you're looking for:我想这就是你要找的：

def get_n_words(text, char_nr, nr_words):
    if text[char_nr-1] == " ":
        list_of_words_before = text[:char_nr-1].split()
    else:
        list_of_words_before = text[:char_nr-1].split()[:-1]
    print(list_of_words_before)
    if nr_words >len(list_of_words_before):
        nr_words = len(list_of_words_before)
        
    print(list_of_words_before[-nr_words:])

text_1 = 'the house is big the house is big the house is big'
text_2 = 'the house is big a house is big the house is big'

print("Last word truncated:")
get_n_words(text_1, 19, 3)
print("\nLast word not truncated:")
get_n_words(text_2, 19, 3)

That has the following output:具有以下 output：

Last word truncated:
['the', 'house', 'is', 'big']
['house', 'is', 'big']

Last word not truncated:
['the', 'house', 'is', 'big', 'a']
['is', 'big', 'a']

Answer 5

You might use a pattern starting the match with a non whitespace character using \S and then match 0-18 times any character using .{0,18} while asserting not a non whitespace character to the right using a negative lookahead (?!\S)您可以使用使用\S以非空白字符开始匹配的模式，然后使用.{0,18}匹配任何字符的 0-18 次，同时使用负前瞻(?!\S)

\S.{0,18}(?!\S)

Regex demo |正则表达式演示| Python demo Python 演示

import re

text = 'the house is big the house is big the house is big'
char_nr = 19
pattern = rf"\S.{{0,{char_nr - 1}}}(?!\S)"

strings = re.findall(pattern, text)

print(strings)

list_of_words_before = strings[1].split()
print(list_of_words_before)

nr_words = 3
lenOfWordsBefore = len(list_of_words_before)
if nr_words > lenOfWordsBefore:
    nr_words = lenOfWordsBefore

print(list_of_words_before[-nr_words:])

Output Output

['the house is big', 'the house is big', 'the house is big']
['the', 'house', 'is', 'big']
['house', 'is', 'big']

在字符串中的字符编号之前获取 n 个单词而不使用剪切词

问题描述

5 个解决方案

解决方案1
1 2022-08-17 11:42:12

解决方案2
0 2022-08-17 11:36:33

解决方案3
0 2022-08-17 11:40:16

解决方案4
0 2022-08-17 11:48:21

解决方案5
0 2022-08-17 14:41:33

在字符串中的字符编号之前获取 n 个单词而不使用剪切词

问题描述

5 个解决方案

解决方案1 1 2022-08-17 11:42:12

解决方案2 0 2022-08-17 11:36:33

解决方案3 0 2022-08-17 11:40:16

解决方案4 0 2022-08-17 11:48:21

解决方案5 0 2022-08-17 14:41:33

解决方案1
1 2022-08-17 11:42:12

解决方案2
0 2022-08-17 11:36:33

解决方案3
0 2022-08-17 11:40:16

解决方案4
0 2022-08-17 11:48:21

解决方案5
0 2022-08-17 14:41:33