簡體   English   中英

在函數編程風格中具有容量限制的子元中拆分python元組

[英]Split python tuple in subtuples with capacity limit in functional programming style

我在python中有一些元組。 例如,容量限制為5.我想將元組中的元組拆分為受其元素總和限制的元組:

例如:

input: (3, 1, 4, 2, 2, 1, 1, 2) and capacity = 5
output: (3, 1) (4) (2, 2, 1) (1, 2) #each subtuple is less than 5, order safe.

我正在尋找一個很好的表達解決方案,這個任務在編程的功能風格中更為可取(例如使用itertools.dropwhile或類似的東西)

您可以封裝非功能部件並從功能代碼中調用它:

from itertools import groupby

class GroupBySum:
    def __init__(self, maxsum):
        self.maxsum = maxsum
        self.index = 0
        self.sum = 0

    def __call__(self, value):
        self.sum += value
        if self.sum > self.maxsum:
            self.index += 1
            self.sum = value
        return self.index

# Example:

for _, l in groupby((3, 1, 4, 2, 2, 1, 1, 2), GroupBySum(5)):
    print(list(l))

我無法幫助它,但寫了一些接近我在Haskell中所做的事情(我認為仍然有些pythonic):

def take_summed(xs, cap):
    if len(xs) <= 1:
        return xs, ()
    else:
        x, *rest = xs

        if x > cap:
            return (), xs
        else:
            init, tail = take_summed(rest, cap - x)
            return (x,) + tuple(init), tail

def split(xs, cap=5):
    if len(xs) <= 1:
        yield xs
    else:
        chunk, rest = take_summed(xs, cap)
        yield chunk

        if rest != ():
            yield from split(rest, cap)

毫不猶豫地將功能分解為子問題。 結果:

In [45]: list(split((3, 1, 4, 2, 2, 1, 1, 2), 5))
Out[45]: [(3, 1), (4,), (2, 2, 1), (1, 2)]

使這個更短的問題並不是沒有副作用就不可行,而是你必須攜帶額外的累積狀態,所以即使使用reduce你也需要發明一些非常復雜的東西來傳遞應用程序之間的總和。

這里有一個與@Jean略有不同的方法,即切片輸入元組而不是構建帶有追加的較小列表,並提供一點性能提升:

def group_by_capacity(tup, capacity=5):
    t = iter(tup)
    curr, s =  0, next(t)

    for i, v in enumerate(t, 1):
        if s + v  > capacity:
            yield tup[curr:i]
            curr  = i
            s = v
        else:
            s += v
    yield tup[curr:]

>>> list(group_by_capacity((3, 1, 4, 2, 2, 1, 1, 2)))
[(3, 1), (4,), (2, 2, 1), (1, 2)]

一些時間:

In [35]: from random import randrange

In [36]: start = tuple((randrange(1,5) for _ in range(100000)))

In [37]: %%timeit
   ....: list(group_by_capacity(start))
   ....:
10 loops, best of 3: 47.4 ms per loop

In [38]: %%timeit
   ....: list(generate_tuple(start))
   ....:
10 loops, best of 3: 61.1 ms per loop

我有點驚訝沒有人使用了一個關鍵功能的itertools.accumulate 無論如何,我的條目:

from itertools import groupby, accumulate

def sumgroup(seq, capacity):
    divided = accumulate(enumerate(seq),
                         lambda x,y: (x[0],x[1]+y[1])
                                     if x[1]+y[1] <= capacity else (x[0]+1,y[1]))
    seq_iter = iter(seq)
    grouped = groupby(divided, key=lambda x: x[0])
    return [[next(seq_iter) for _ in g] for _,g in grouped]

有很多變種,例如你可以使用zip(seq, divided) seq_iter zip(seq, divided)來避免seq_iter等,但這是第一種想到的方式。 它給了我

In [105]: seq = [3, 1, 4, 2, 2, 1, 1, 2]

In [106]: sumgroup(seq, 5)
Out[106]: [[3, 1], [4], [2, 2, 1], [1, 2]]

並同意GroupBySum結果:

In [108]: all(sumgroup(p, 5) == [list(l) for _, l in groupby(p, GroupBySum(5))]
     ...:     for width in range(1,8) for p in product(range(1,6), repeat=width))
     ...:     
     ...: 
Out[108]: True

我正在等待第一個答案提供一個稍微功能性的方法:

start = (3, 1, 4, 2, 2, 1, 1, 2)

def generate_tuple(inp):
    current_sum = 0
    current_list = []
    for e in inp:
        if current_sum + e <= 5:
            current_list.append(e)
            current_sum += e
        else:
            if current_list:  # fixes "6" in first position empty tuple bug
                yield tuple(current_list)
            current_list = [e]
            current_sum = e
    yield tuple(current_list)

print([i for i in generate_tuple(start)])

結果:

[(3, 1), (4,), (2, 2, 1), (1, 2)]

編輯:我找到了一個使用記憶效應的全功能方法,否則它是不可行的。 這很難看,只有當我想到我將如何清楚地解釋它時它才會傷害我。 我已經將輸入數據集加了一點,否則就太容易了

start = (6, 7, 3, 1, 4, 2, 2, 1, 1, 2, 3, 1 ,3, 1, 1)

現在的代碼。 3行,得到一些阿司匹林,你會像我一樣需要它:

mem=[0,0]
start = start + (5,)
print([start[mem[-2]:n] for i in range(0,len(start)) for n in range(i+1,len(start)) if ((n==i+1 and start[i]>=5) or (sum(start[mem[-1]:n])<=5 and sum(start[mem[-1]:n+1])>5)) and not mem.append(n)])

我會試着解釋一下。

  • 我使用記憶效應,因為沒有它就不可能。 存儲在mem並在開始時設置為0,0
  • 由於函數忽略了最后一項,我修改輸入數據以將閾值添加到以前的值不會被刪除
  • 唯一簡單的事情是計算2個和並檢測超過閾值的指數。 檢測到此閾值時,將滿足兩個條件並激活第三個條件:在mem存儲索引。 由於append返回None ,因此最后一個條件始終為true
  • ((n==i+1 and start[i]>=5)是檢測大於或等於5的單個值。
  • 其余的是一些微調,直到輸出與程序方法相同,現在看起來不那么糟糕:)

不知道為什么你在元組中都需要它們,但如果你不這樣做,你可以刪除tuple(...)轉換:

def chunkit(tpl, capacity):
    ret = []
    cur = []
    for x in tpl:
        if sum(cur) + x > capacity:
            ret.append(tuple(cur))
            cur = [x]
        else:
            cur.append(x)
    if cur != []:
        ret.append(tuple(cur))

    return tuple(ret)

幾個例子:

In [24]: chunkit((3, 1, 4, 2, 2, 1, 1), 5)
Out[24]: ((3, 1), (4,), (2, 2, 1), (1,))

In [25]: chunkit((3, 1, 4, 2, 2, 1, ), 5)
Out[25]: ((3, 1), (4,), (2, 2, 1))

In [26]: chunkit((3, 1, 4, 2, 2, 1, 5), 5)
Out[26]: ((3, 1), (4,), (2, 2, 1), (5,))

In [27]: chunkit((3, 1, 4, 2, 2, 1, 5, 6), 5)
Out[27]: ((3, 1), (4,), (2, 2, 1), (5,), (6,))

In [28]: chunkit((3, 1, 4, 2, 2, 1, 5, 6, 1, 6), 5)
Out[28]: ((3, 1), (4,), (2, 2, 1), (5,), (6,), (1,), (6,))

不知道這是否具有實用性,但它是我能想到的最接近的:

def groupLimit(iterable, limit):
    i, cSum = 0, 0
    def pred(x):
        nonlocal i, cSum, limit
        i, cSum = (i + 1, x) if (x + cSum) > limit else (i, cSum + x)
        return i if x <= limit else -1
    return (tuple(g) for k, g in itertools.groupby(iterable, pred) if k != -1)

這也將挑選出大於限制的單個值。 如果不是這樣的話,最后兩行可以改為:

        return i
    return (tuple(g) for k, g in itertools.groupby(iterable, pred))

例:

t = (3, 1, 6, 2, 2, 1, 1, 2)
a = groupLimit(t,5)
print(tuple(a))
# version 1 -> ((3, 1), (2, 2, 1), (1, 2))
# version 2 -> ((3, 1), (6,), (2, 2, 1), (1, 2))

讓我們用itertools定義powerset

from itertools import chain, combinations

def powerset(lst):
    for subset in chain.from_iterable(combinations(lst, r) for r in range(len(lst)+1)):
        yield subset

然后我們可以用一個班輪做

[subset for subset in powerset(input) if sum(subset)<=capacity]

更通用的解決方案:

def groupwhile(iterable,predicate,accumulator_function):
    continue_group = False
    iterator = iter(iterable)
    try:
        accumulated = next(iterator)
    except StopIteration:
        return
    current_group = [accumulated]
    for item in iterator:
        continue_group = predicate(accumulated,item)
        if continue_group:
            current_group.append(item)
            accumulated = accumulator_function(accumulated,item)
        else:
            yield current_group
            accumulated = item
            current_group = [item]

    yield current_group

#your case
assert (list(groupwhile(
    (3, 1, 4, 2, 2, 1, 1, 2),
    lambda previous_sum,item: previous_sum + item <= 5,
    lambda previous_sum,item: previous_sum + item,
))) == [[3, 1], [4], [2, 2, 1], [1, 2]]

#equivalent to groupby with key not set
assert (list(groupwhile(
    (3, 1, 4, 2, 2, 1, 1, 2),
    lambda previous_item,item: previous_item == item,
    lambda _,item: item,
))) == [[3], [1], [4], [2, 2], [1, 1], [2]]

#break on duplicates
assert (list(groupwhile(
    (3, 1, 4, 2, 2, 1, 1, 2),
    lambda previous_item,item: previous_item != item,
    lambda _,item: item,
))) == [[3, 1, 4, 2], [2, 1], [1, 2]]

#start new group when the number is one
assert (list(groupwhile(
    (3, 1, 4, 2, 2, 1, 1, 2),
    lambda _,item: item != 1,
    lambda _1,_2: None,
))) == [[3], [1, 4, 2, 2], [1], [1, 2]]

我的解決方案,不是很干凈,但只使用減少:

# int, (int, int, ...) -> ((int, ...), ...)
def grupBySum(capacity, _tuple):

    def  _grupBySum(prev, number):
        counter = prev['counter']
        result = prev['result']
        counter = counter + (number,)
        if sum(counter) > capacity:
            result = result + (counter[:-1],)
            return {'counter': (number,), 'result': result}
        else:
            return {'counter': counter, 'result': result}

result = reduce(_grupBySum, _tuple, {'counter': (), 'result': ()}).values()
return result[1]  + (result[0],)

f = (3, 1, 4, 2, 2, 1, 1, 2)
h = grupBySum(5, f)
print(h) # -> ((3, 1), (4,), (2, 2, 1), (1, 2))

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM