简体   繁体   English

使用 sum() 连接元组

[英]Concatenate tuples using sum()

From this post I learned that you can concatenate tuples with sum() :这篇文章中,我了解到您可以使用sum()连接元组:

>>> tuples = (('hello',), ('these', 'are'), ('my', 'tuples!'))
>>> sum(tuples, ())
('hello', 'these', 'are', 'my', 'tuples!')

Which looks pretty nice.这看起来很不错。 But why does this work?但为什么这样做呢? And, is this optimal, or is there something from itertools that would be preferable to this construct?而且,这是最优的,还是itertools中的某些东西比这个构造更可取?

the addition operator concatenates tuples in python: 加法运算符连接python中的元组:

('a', 'b')+('c', 'd')
Out[34]: ('a', 'b', 'c', 'd')

From the docstring of sum : 从文档字符串sum

Return the sum of a 'start' value (default: 0) plus an iterable of numbers 返回'start'值的总和(默认值:0)加上可迭代的数字

It means sum doesn't start with the first element of your iterable, but rather with an initial value that is passed through start= argument. 这意味着sum不是以iterable的第一个元素开头,而是使用通过start= argument传递的初始值。

By default sum is used with numeric thus the default start value is 0 . 默认情况下, sum与numeric一起使用,因此默认的起始值为0 So summing an iterable of tuples requires to start with an empty tuple. 因此,总结一个可迭代的元组需要从一个空元组开始。 () is an empty tuple: ()是一个空元组:

type(())
Out[36]: tuple

Therefore the working concatenation. 因此工作串联。

As per performance, here is a comparison: 根据表现,这是一个比较:

%timeit sum(tuples, ())
The slowest run took 9.40 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 285 ns per loop


%timeit tuple(it.chain.from_iterable(tuples))
The slowest run took 5.00 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 625 ns per loop

Now with t2 of a size 10000: 现在使用大小为10000的t2:

%timeit sum(t2, ())
10 loops, best of 3: 188 ms per loop

%timeit tuple(it.chain.from_iterable(t2))
1000 loops, best of 3: 526 µs per loop

So if your list of tuples is small, you don't bother. 所以如果你的元组列表很小,你就不用费心了。 If it's medium size or larger, you should use itertools . 如果它是中等大小或更大,你应该使用itertools

That's clever and I had to laugh because help expressly forbids strings, but it works 这很聪明,我不得不笑,因为帮助明确禁止字符串,但它的工作原理

sum(...)
    sum(iterable[, start]) -> value

    Return the sum of an iterable of numbers (NOT strings) plus the value
    of parameter 'start' (which defaults to 0).  When the iterable is
    empty, return start.

You can add tuples to get a new, bigger tuple. 您可以添加元组以获得更大的新元组。 And since you gave a tuple as a start value, the addition works. 因为你给了一个元组作为一个起始值,所以加法有效。

Just to complement the accepted answer with some more benchmarks: 只是为了补充一些更基准的公认答案:

import functools, operator, itertools
import numpy as np
N = 10000
M = 2

ll = tuple(tuple(x) for x in np.random.random((N, M)).tolist())

%timeit functools.reduce(operator.add, ll)
# 407 ms ± 5.63 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit functools.reduce(lambda x, y: x + y, ll)
# 425 ms ± 7.16 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit sum(ll, ())
# 426 ms ± 14.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit tuple(itertools.chain(*ll))
# 601 µs ± 5.43 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit tuple(itertools.chain.from_iterable(ll))
# 546 µs ± 25.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

EDIT : the code is updated to actually use tuples. 编辑 :代码更新为实际使用元组。 And, as per comments, the last two options are now inside a tuple() constructors, and all the times have been updated (for consistency). 并且,根据评论,最后两个选项现在位于tuple()构造函数中,并且所有时间都已更新(为了一致性)。 The itertools.chain* options are still the fastest but now the margin is reduced. itertools.chain*选项仍然是最快的,但现在保证金减少了。

It works because addition is overloaded (on tuples) to return the concatenated tuple: 它的工作原理是因为加载重载(在元组上)以返回连接的元组:

>>> () + ('hello',) + ('these', 'are') + ('my', 'tuples!')
('hello', 'these', 'are', 'my', 'tuples!')

That's basically what sum is doing, you give an initial value of an empty tuple and then add the tuples to that. 这基本上就是sum正在做的事情,你给出一个空元组的初始值,然后将元组添加到那个元组。

However this is generally a bad idea because addition of tuples creates a new tuple, so you create several intermediate tuples just to copy them into the concatenated tuple: 但是这通常是一个坏主意,因为添加元组会创建一个新元组,所以你创建了几个中间元组只是为了将它们复制到连接的元组中:

()
('hello',)
('hello', 'these', 'are')
('hello', 'these', 'are', 'my', 'tuples!')

That's an implementation that has quadratic runtime behavior. 这是一个具有二次运行时行为的实现。 That quadratic runtime behavior can be avoided by avoiding the intermediate tuples. 通过避免中间元组可以避免二次运行时行为。

>>> tuples = (('hello',), ('these', 'are'), ('my', 'tuples!'))

Using nested generator expressions: 使用嵌套的生成器表达式:

>>> tuple(tuple_item for tup in tuples for tuple_item in tup)
('hello', 'these', 'are', 'my', 'tuples!')

Or using a generator function: 或使用生成器功能:

def flatten(it):
    for seq in it:
        for item in seq:
            yield item


>>> tuple(flatten(tuples))
('hello', 'these', 'are', 'my', 'tuples!')

Or using itertools.chain.from_iterable : 或者使用itertools.chain.from_iterable

>>> import itertools
>>> tuple(itertools.chain.from_iterable(tuples))
('hello', 'these', 'are', 'my', 'tuples!')

And if you're interested how these perform (using my simple_benchmark package ): 如果你对这些表演感兴趣(使用我的simple_benchmark ):

import itertools
import simple_benchmark

def flatten(it):
    for seq in it:
        for item in seq:
            yield item

def sum_approach(tuples):
    return sum(tuples, ())

def generator_expression_approach(tuples):
    return tuple(tuple_item for tup in tuples for tuple_item in tup)

def generator_function_approach(tuples):
    return tuple(flatten(tuples))

def itertools_approach(tuples):
    return tuple(itertools.chain.from_iterable(tuples))

funcs = [sum_approach, generator_expression_approach, generator_function_approach, itertools_approach]
arguments = {(2**i): tuple((1,) for i in range(1, 2**i)) for i in range(1, 13)}
b = simple_benchmark.benchmark(funcs, arguments, argument_name='number of tuples to concatenate')

b.plot()

在此输入图像描述

(Python 3.7.2 64bit, Windows 10 64bit) (Python 3.7.2 64位,Windows 10 64位)

So while the sum approach is very fast if you concatenate only a few tuples it will be really slow if you try to concatenate lots of tuples. 因此,如果只连接几个元组, sum方法非常快,如果你尝试连接大量元组,它将会非常慢。 The fastest of the tested approaches for many tuples is itertools.chain.from_iterable 许多元组测试方法中最快的是itertools.chain.from_iterable

The second argument start , where you put () , is the starting object to add to, it's 0 in default for number addition.第二个参数start ,您放置() ,是要添加到的起始对象,默认情况下为0以进行数字添加。

Here is a sample implementation of sum (what I expect):这是sum的示例实现(我所期望的):

def sum(iterable, /, start=0):
    for element in iterable:
        start += element
    return start

Example:例子:

>>> sum([1, 2, 3])
6
>>> tuples = (('hello',), ('these', 'are'), ('my', 'tuples!'))
>>> sum(tuples)
TypeError: unsupported operand type(s) for +=: 'int' and 'tuple'
>>> sum(tuples, ())
('hello', 'these', 'are', 'my', 'tuples!')
>>> 

It works since tuple concatenation with + is supported.它的工作原理是支持元组与+连接。

Virtually this gets translated to:实际上这被翻译成:

>>> () + ('hello',) + ('these', 'are') + ('my', 'tuples!')
('hello', 'these', 'are', 'my', 'tuples!')
>>> 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM