[英]Split python tuple in subtuples with capacity limit in functional programming style
我在python中有一些元組。 例如,容量限制為5.我想將元組中的元組拆分為受其元素總和限制的元組:
例如:
input: (3, 1, 4, 2, 2, 1, 1, 2) and capacity = 5
output: (3, 1) (4) (2, 2, 1) (1, 2) #each subtuple is less than 5, order safe.
我正在尋找一個很好的表達解決方案,這個任務在編程的功能風格中更為可取(例如使用itertools.dropwhile
或類似的東西)
您可以封裝非功能部件並從功能代碼中調用它:
from itertools import groupby
class GroupBySum:
def __init__(self, maxsum):
self.maxsum = maxsum
self.index = 0
self.sum = 0
def __call__(self, value):
self.sum += value
if self.sum > self.maxsum:
self.index += 1
self.sum = value
return self.index
# Example:
for _, l in groupby((3, 1, 4, 2, 2, 1, 1, 2), GroupBySum(5)):
print(list(l))
我無法幫助它,但寫了一些接近我在Haskell中所做的事情(我認為仍然有些pythonic):
def take_summed(xs, cap):
if len(xs) <= 1:
return xs, ()
else:
x, *rest = xs
if x > cap:
return (), xs
else:
init, tail = take_summed(rest, cap - x)
return (x,) + tuple(init), tail
def split(xs, cap=5):
if len(xs) <= 1:
yield xs
else:
chunk, rest = take_summed(xs, cap)
yield chunk
if rest != ():
yield from split(rest, cap)
毫不猶豫地將功能分解為子問題。 結果:
In [45]: list(split((3, 1, 4, 2, 2, 1, 1, 2), 5))
Out[45]: [(3, 1), (4,), (2, 2, 1), (1, 2)]
使這個更短的問題並不是沒有副作用就不可行,而是你必須攜帶額外的累積狀態,所以即使使用reduce
你也需要發明一些非常復雜的東西來傳遞應用程序之間的總和。
這里有一個與@Jean略有不同的方法,即切片輸入元組而不是構建帶有追加的較小列表,並提供一點性能提升:
def group_by_capacity(tup, capacity=5):
t = iter(tup)
curr, s = 0, next(t)
for i, v in enumerate(t, 1):
if s + v > capacity:
yield tup[curr:i]
curr = i
s = v
else:
s += v
yield tup[curr:]
>>> list(group_by_capacity((3, 1, 4, 2, 2, 1, 1, 2)))
[(3, 1), (4,), (2, 2, 1), (1, 2)]
一些時間:
In [35]: from random import randrange
In [36]: start = tuple((randrange(1,5) for _ in range(100000)))
In [37]: %%timeit
....: list(group_by_capacity(start))
....:
10 loops, best of 3: 47.4 ms per loop
In [38]: %%timeit
....: list(generate_tuple(start))
....:
10 loops, best of 3: 61.1 ms per loop
我有點驚訝沒有人使用了一個關鍵功能的itertools.accumulate
。 無論如何,我的條目:
from itertools import groupby, accumulate
def sumgroup(seq, capacity):
divided = accumulate(enumerate(seq),
lambda x,y: (x[0],x[1]+y[1])
if x[1]+y[1] <= capacity else (x[0]+1,y[1]))
seq_iter = iter(seq)
grouped = groupby(divided, key=lambda x: x[0])
return [[next(seq_iter) for _ in g] for _,g in grouped]
有很多變種,例如你可以使用zip(seq, divided)
seq_iter
zip(seq, divided)
來避免seq_iter
等,但這是第一種想到的方式。 它給了我
In [105]: seq = [3, 1, 4, 2, 2, 1, 1, 2]
In [106]: sumgroup(seq, 5)
Out[106]: [[3, 1], [4], [2, 2, 1], [1, 2]]
並同意GroupBySum
結果:
In [108]: all(sumgroup(p, 5) == [list(l) for _, l in groupby(p, GroupBySum(5))]
...: for width in range(1,8) for p in product(range(1,6), repeat=width))
...:
...:
Out[108]: True
我正在等待第一個答案提供一個稍微功能性的方法:
start = (3, 1, 4, 2, 2, 1, 1, 2)
def generate_tuple(inp):
current_sum = 0
current_list = []
for e in inp:
if current_sum + e <= 5:
current_list.append(e)
current_sum += e
else:
if current_list: # fixes "6" in first position empty tuple bug
yield tuple(current_list)
current_list = [e]
current_sum = e
yield tuple(current_list)
print([i for i in generate_tuple(start)])
結果:
[(3, 1), (4,), (2, 2, 1), (1, 2)]
編輯:我找到了一個使用記憶效應的全功能方法,否則它是不可行的。 這很難看,只有當我想到我將如何清楚地解釋它時它才會傷害我。 我已經將輸入數據集加了一點,否則就太容易了
start = (6, 7, 3, 1, 4, 2, 2, 1, 1, 2, 3, 1 ,3, 1, 1)
現在的代碼。 3行,得到一些阿司匹林,你會像我一樣需要它:
mem=[0,0]
start = start + (5,)
print([start[mem[-2]:n] for i in range(0,len(start)) for n in range(i+1,len(start)) if ((n==i+1 and start[i]>=5) or (sum(start[mem[-1]:n])<=5 and sum(start[mem[-1]:n+1])>5)) and not mem.append(n)])
我會試着解釋一下。
mem
並在開始時設置為0,0 mem
存儲索引。 由於append
返回None
,因此最后一個條件始終為true ((n==i+1 and start[i]>=5)
是檢測大於或等於5的單個值。 不知道為什么你在元組中都需要它們,但如果你不這樣做,你可以刪除tuple(...)
轉換:
def chunkit(tpl, capacity):
ret = []
cur = []
for x in tpl:
if sum(cur) + x > capacity:
ret.append(tuple(cur))
cur = [x]
else:
cur.append(x)
if cur != []:
ret.append(tuple(cur))
return tuple(ret)
幾個例子:
In [24]: chunkit((3, 1, 4, 2, 2, 1, 1), 5)
Out[24]: ((3, 1), (4,), (2, 2, 1), (1,))
In [25]: chunkit((3, 1, 4, 2, 2, 1, ), 5)
Out[25]: ((3, 1), (4,), (2, 2, 1))
In [26]: chunkit((3, 1, 4, 2, 2, 1, 5), 5)
Out[26]: ((3, 1), (4,), (2, 2, 1), (5,))
In [27]: chunkit((3, 1, 4, 2, 2, 1, 5, 6), 5)
Out[27]: ((3, 1), (4,), (2, 2, 1), (5,), (6,))
In [28]: chunkit((3, 1, 4, 2, 2, 1, 5, 6, 1, 6), 5)
Out[28]: ((3, 1), (4,), (2, 2, 1), (5,), (6,), (1,), (6,))
不知道這是否具有實用性,但它是我能想到的最接近的:
def groupLimit(iterable, limit):
i, cSum = 0, 0
def pred(x):
nonlocal i, cSum, limit
i, cSum = (i + 1, x) if (x + cSum) > limit else (i, cSum + x)
return i if x <= limit else -1
return (tuple(g) for k, g in itertools.groupby(iterable, pred) if k != -1)
這也將挑選出大於限制的單個值。 如果不是這樣的話,最后兩行可以改為:
return i
return (tuple(g) for k, g in itertools.groupby(iterable, pred))
例:
t = (3, 1, 6, 2, 2, 1, 1, 2)
a = groupLimit(t,5)
print(tuple(a))
# version 1 -> ((3, 1), (2, 2, 1), (1, 2))
# version 2 -> ((3, 1), (6,), (2, 2, 1), (1, 2))
讓我們用itertools
定義powerset
from itertools import chain, combinations
def powerset(lst):
for subset in chain.from_iterable(combinations(lst, r) for r in range(len(lst)+1)):
yield subset
然后我們可以用一個班輪做
[subset for subset in powerset(input) if sum(subset)<=capacity]
更通用的解決方案:
def groupwhile(iterable,predicate,accumulator_function):
continue_group = False
iterator = iter(iterable)
try:
accumulated = next(iterator)
except StopIteration:
return
current_group = [accumulated]
for item in iterator:
continue_group = predicate(accumulated,item)
if continue_group:
current_group.append(item)
accumulated = accumulator_function(accumulated,item)
else:
yield current_group
accumulated = item
current_group = [item]
yield current_group
#your case
assert (list(groupwhile(
(3, 1, 4, 2, 2, 1, 1, 2),
lambda previous_sum,item: previous_sum + item <= 5,
lambda previous_sum,item: previous_sum + item,
))) == [[3, 1], [4], [2, 2, 1], [1, 2]]
#equivalent to groupby with key not set
assert (list(groupwhile(
(3, 1, 4, 2, 2, 1, 1, 2),
lambda previous_item,item: previous_item == item,
lambda _,item: item,
))) == [[3], [1], [4], [2, 2], [1, 1], [2]]
#break on duplicates
assert (list(groupwhile(
(3, 1, 4, 2, 2, 1, 1, 2),
lambda previous_item,item: previous_item != item,
lambda _,item: item,
))) == [[3, 1, 4, 2], [2, 1], [1, 2]]
#start new group when the number is one
assert (list(groupwhile(
(3, 1, 4, 2, 2, 1, 1, 2),
lambda _,item: item != 1,
lambda _1,_2: None,
))) == [[3], [1, 4, 2, 2], [1], [1, 2]]
我的解決方案,不是很干凈,但只使用減少:
# int, (int, int, ...) -> ((int, ...), ...)
def grupBySum(capacity, _tuple):
def _grupBySum(prev, number):
counter = prev['counter']
result = prev['result']
counter = counter + (number,)
if sum(counter) > capacity:
result = result + (counter[:-1],)
return {'counter': (number,), 'result': result}
else:
return {'counter': counter, 'result': result}
result = reduce(_grupBySum, _tuple, {'counter': (), 'result': ()}).values()
return result[1] + (result[0],)
f = (3, 1, 4, 2, 2, 1, 1, 2)
h = grupBySum(5, f)
print(h) # -> ((3, 1), (4,), (2, 2, 1), (1, 2))
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.