[英]Splitting strings inside a list of lists by spaces
Suppose I have the following structure:假设我有以下结构:
t = [['I will','take','care'],['I know','what','to','do']]
As you see in the first list I have 'I will'
and I want them to split into two elements 'I'
and 'will'
, st the result is:正如您在第一个列表中看到的那样,我有
'I will'
并且我希望它们分成两个元素'I'
和'will'
,结果是:
[['I', 'will', 'take', 'care'], ['I', 'know', 'what', 'to', 'do']]
Quick and dirty algorithm is the following:快速而肮脏的算法如下:
train_text_new = []
for sent in t:
new = []
for word in sent:
temp = word.split(' ')
for item2 in temp:
new.append(item2)
train_text_new.append(new)
But I would like to know if there is a more readable and maybe more efficient algorithm for solving this problem.但我想知道是否有更易读、可能更有效的算法来解决这个问题。
You could make a simple generator that yields the splits, then use that in a list comprehension:您可以制作一个简单的生成器来产生拆分,然后在列表理解中使用它:
t = [['I will','take','care'],['I know','what','to','do']]
def splitWords(l):
for words in l:
yield from words.split()
[list(splitWords(sublist)) for sublist in t]
# [['I', 'will', 'take', 'care'], ['I', 'know', 'what', 'you', 'to', 'do']]
You can try this.你可以试试这个。 Assuming splitting always happens to the first element of the sublist
假设拆分总是发生在子列表的第一个元素上
t = [['I will','take','care'],['I know','what','to','do']]
[start.split()+rest for start,*rest in t]
# [['I', 'will', 'take', 'care'], ['I', 'know', 'what', 'to', 'do']]
If splitting should happen to any word in sublist try this.如果拆分应该发生在子列表中的任何单词上,试试这个。
[[j for i in lst for j in i.split()]for lst in t]
# [['I', 'will', 'take', 'care'], ['I', 'know', 'what', 'to', 'do']]
joining every inner list to a string usin join
and splitting that string using split
to list will do the trick使用
join
将每个内部列表连接到一个字符串并使用split
to list 拆分该字符串就可以了
t = [['I will','take','care'],['I know','what','to','do']]
res = [' '.join(i).split() for i in t]
print(res)
# output [['I', 'will', 'take', 'care'], ['I', 'know', 'what', 'to', 'do']]
You could use itertools.chain.from_iterable
to do the flattening after splitting:您可以使用
itertools.chain.from_iterable
在拆分后进行展平:
from itertools import chain
t = [['I will','take','care'],['I know','what','to','do']]
print([list(chain.from_iterable(x.split() for x in y)) for y in t])
Output: Output:
[['I', 'will', 'take', 'care'], ['I', 'know', 'what', 'to', 'do']]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.