[英]Python: How to split a list based on a specific element
If we have the following list in Python如果我们在 Python 中有以下列表
sentence = ["I", "am", "good", ".", "I", "like", "you", ".", "we", "are", "not", "friends", "."]
How do I split this to get a list which contains elements that finish with the full stop?如何拆分它以获取包含以句号结尾的元素的列表? So i want to get the following elements in my new list:所以我想在我的新列表中获取以下元素:
["I","am","good","."]
["I","like","you","."]
["we","are","not","friends","."]
My attempts so far:到目前为止我的尝试:
cleaned_sentence = []
a = 0
while a < len(sentence):
current_word = sentence[a]
if current_word == "." and len(cleaned_sentence) == 0:
cleaned_sentence.append(sentence[0:sentence.index(".")+1])
a += 1
elif current_word == "." and len(cleaned_sentence) > 0:
sub_list = sentence[sentence.index(".")+1:-1]
sub_list.append(sentence[-1])
cleaned_sentence.append(sub_list[0:sentence.index(".")+1])
a += 1
else:
a += 1
for each in cleaned_sentence:
print(each)
Running this on sentence
produces在sentence
上运行它会产生
['I', 'am', 'good', '.']
['I', 'like', 'you', '.']
['I', 'like', 'you', '.']
You can use itertools.groupby
:您可以使用itertools.groupby
:
from itertools import groupby
i = (list(g) for _, g in groupby(sentence, key='.'.__ne__))
print([a + b for a, b in zip(i, i)])
This outputs:这输出:
[['I', 'am', 'good', '.'], ['I', 'like', 'you', '.'], ['we', 'are', 'not', 'friends', '.']]
If your list doesn't always end with '.'
如果您的列表并不总是以'.'
结尾then you can use itertools.zip_longest
instead:那么您可以改用itertools.zip_longest
:
sentence = ["I", "am", "good", ".", "I", "like", "you", ".", "we", "are", "not", "friends"]
i = (list(g) for _, g in groupby(sentence, key='.'.__ne__))
print([a + b for a, b in zip_longest(i, i, fillvalue=[])])
This outputs:这输出:
[['I', 'am', 'good', '.'], ['I', 'like', 'you', '.'], ['we', 'are', 'not', 'friends']]
We can do this in two stages: first calculating the indices where the dots are located, and then making slices, like:我们可以分两个阶段进行:首先计算点所在的索引,然后制作切片,例如:
idxs = [i for i, v in enumerate(sentence, 1) if v == '.'] # calculating indices
result = [sentence[i:j] for i, j in zip([0]+idxs, idxs)] # splitting accordingly
This then yields:然后产生:
>>> [sentence[i:j] for i, j in zip([0]+idxs, idxs)]
[['I', 'am', 'good', '.'], ['I', 'like', 'you', '.'], ['we', 'are', 'not', 'friends', '.']]
You can then for example print the individual elements with:然后,您可以使用以下方法打印单个元素:
for sub in [sentence[i:j] for i, j in zip([0]+idxs, idxs)]:
print(sub)
This then will print:这将打印:
>>> idxs = [i for i, v in enumerate(sentence, 1) if v == '.']
>>> for sub in [sentence[i:j] for i, j in zip([0]+idxs, idxs)]:
... print(sub)
...
['I', 'am', 'good', '.']
['I', 'like', 'you', '.']
['we', 'are', 'not', 'friends', '.']
sentence = ["I", "am", "good", ".", "I", "like", "you", ".", "we", "are", "not", "friends", "."]
output = []
temp = []
for item in sentence:
temp.append(item)
if item == '.':
output.append(temp)
temp = []
if temp:
output.append(temp)
print(output)
Using a simple iteration.使用简单的迭代。
Demo:演示:
sentence = ["I", "am", "good", ".", "I", "like", "you", ".", "we", "are", "not", "friends", "."]
last = len(sentence) - 1
result = [[]]
for i, v in enumerate(sentence):
if v == ".":
result[-1].append(".")
if i != last:
result.append([])
else:
result[-1].append(v)
print(result)
Output:输出:
[['I', 'am', 'good', '.'], ['I', 'like', 'you', '.'], ['we', 'are', 'not', 'friends', '.']]
This answer aims to be the simplest one...这个答案旨在成为最简单的答案......
The data数据
sentences = ["I", "am", "good", ".",
"I", "like", "you", ".",
"We", "are", "not", "friends", "."]
We initialize the output list and represent that we are start ing a new sentence我们初始化输出列表并表示我们正在开始一个新句子
l, start = [], 1
We loop on the data list, using w
to address the current word我们在数据列表上循环,使用w
来寻址当前单词
"."
如果我们在最后——我们遇到了一个"."
— we raise again the flag. ——我们再次升旗。Note the single comment…请注意单个评论...
for w in sentences:
if start: start = l.append([]) # l.append() returns None, that is falsey...
l[-1].append(w)
if w == ".": start = 1
You could do this by joining the elements together into a string and then splitting the string back again using a regex:您可以通过将元素连接在一起形成一个字符串,然后使用正则表达式再次拆分该字符串来做到这一点:
import re
sentence = ["I", "am", "good", ".", "I", "like", "you", ".", "we", "are", "not", "friends", "."]
result = [m.split('\0') for m in re.findall(r'(?<=\0).*?\.(?=\0|$)', '\0'.join(['.']+sentence))]
Output:输出:
[
['I', 'am', 'good', '.'],
['I', 'like', 'you', '.'],
['we', 'are', 'not', 'friends', '.']
]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.