简体   繁体   English

在python的列表中连接元组的元素

[英]Concatenate elements of a tuple in a list in python

I have a list of tuples that has strings in it For instance: 我有一个元组列表,其中包含字符串例如:

[('this', 'is', 'a', 'foo', 'bar', 'sentences')
('is', 'a', 'foo', 'bar', 'sentences', 'and')
('a', 'foo', 'bar', 'sentences', 'and', 'i')
('foo', 'bar', 'sentences', 'and', 'i', 'want')
('bar', 'sentences', 'and', 'i', 'want', 'to')
('sentences', 'and', 'i', 'want', 'to', 'ngramize')
('and', 'i', 'want', 'to', 'ngramize', 'it')]

Now I wish to concatenate each string in a tuple to create a list of space separated strings. 现在我希望连接一个元组中的每个字符串以创建一个空格分隔字符串列表。 I used the following method: 我使用以下方法:

NewData=[]
for grams in sixgrams:
       NewData.append( (''.join([w+' ' for w in grams])).strip())

which is working perfectly fine. 哪个工作得很好。

However, the list that I have has over a million tuples. 但是,我拥有的列表有超过一百万个元组。 So my question is that is this method efficient enough or is there some better way to do it. 所以我的问题是这种方法足够有效还是有更好的方法来做到这一点。 Thanks. 谢谢。

For a lot of data, you should consider whether you need to keep it all in a list. 对于大量数据,您应该考虑是否需要将其全部保存在列表中。 If you are processing each one at a time, you can create a generator that will yield each joined string, but won't keep them all around taking up memory: 如果你一次处理每一个,你可以创建一个生成器,它将产生每个连接的字符串,但不会让它们全部占用内存:

new_data = (' '.join(w) for w in sixgrams)

if you can get the original tuples also from a generator, then you can avoid having the sixgrams list in memory as well. 如果你也可以从生成器获得原始元组,那么你也可以避免在内存中使用sixgrams列表。

The list comprehension creates temporary strings. 列表理解创建临时字符串。 Just use ' '.join instead. 只需使用' '.join

>>> words_list = [('this', 'is', 'a', 'foo', 'bar', 'sentences'),
...               ('is', 'a', 'foo', 'bar', 'sentences', 'and'),
...               ('a', 'foo', 'bar', 'sentences', 'and', 'i'),
...               ('foo', 'bar', 'sentences', 'and', 'i', 'want'),
...               ('bar', 'sentences', 'and', 'i', 'want', 'to'),
...               ('sentences', 'and', 'i', 'want', 'to', 'ngramize'),
...               ('and', 'i', 'want', 'to', 'ngramize', 'it')]
>>> new_list = []
>>> for words in words_list:
...     new_list.append(' '.join(words)) # <---------------
... 
>>> new_list
['this is a foo bar sentences', 
 'is a foo bar sentences and', 
 'a foo bar sentences and i', 
 'foo bar sentences and i want', 
 'bar sentences and i want to', 
 'sentences and i want to ngramize', 
 'and i want to ngramize it']

Above for loop can be expressed as following list comprehension: 以上for循环可表示为以下列表理解:

new_list = [' '.join(words) for words in words_list] 

You can do this efficiently like this 你可以像这样有效地做到这一点

joiner = " ".join
print map(joiner, sixgrams)

We can still improve the performance using list comprehension like this 我们仍然可以使用这样的列表理解来提高性能

joiner = " ".join
print [joiner(words) for words in sixgrams]

The performance comparison shows that the above seen list comprehension solution is slightly faster than other two solutions. 性能对比表明,上面列出的列表理解解决方案比其他两个解决方案略快。

from timeit import timeit

joiner = " ".join

def mapSolution():
    return map(joiner, sixgrams)

def comprehensionSolution1():
    return ["".join(words) for words in sixgrams]

def comprehensionSolution2():
    return [joiner(words) for words in sixgrams]

print timeit("mapSolution()", "from __main__ import joiner, mapSolution, sixgrams")
print timeit("comprehensionSolution1()", "from __main__ import sixgrams, comprehensionSolution1, joiner")
print timeit("comprehensionSolution2()", "from __main__ import sixgrams, comprehensionSolution2, joiner")

Output on my machine 在我的机器上输出

1.5691678524
1.66710209846
1.47555398941

The performance gain is most likely because of the fact that, we don't have to create the join function from the empty string everytime. 性能提升很可能是因为我们不必每次都从空字符串创建连接函数。

Edit: Though we can improve the performance like this, the most pythonic way is to go with generators like in lvc's answer . 编辑:虽然我们可以改善这样的性能,但最流行的方式是使用lvc的答案中的生成器。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM