簡體   English   中英

將長元組列表拆分為嵌套列表

[英]Split long list of tuples into nested lists

我有一長串元組,需要將其轉換為嵌套列表結構。

一長串元組是一個結構如下的數據列表:

[(0.8, A), (-0.4, B), (1.0, C), (0.5, D), (-0.7, E)]

我有一個這樣的長度列表:

[2, 2, 1]

我的目標是以這樣的嵌套列表結束:

[[(0.8, A), (-0.4, B)], [(1.0, C), (0.5, D)], [(-0.7, E)]]

本質上,長度列表告訴元組列表中有多少元組放入每個嵌套列表中,但我不知道如何做到這一點。

不是那么 Pythonic,但for循環會做:

x = [(0.8, 'A'), (-0.4, 'B'), (1.0, 'C'), (0.5, 'D'), (-0.7, 'E')]
lns = [2, 2, 1]

res = []
start = 0
for ln in lns:
    res.append(x[start:start+ln])
    start += ln
    
print(res)

Output:

[[(0.8, 'A'), (-0.4, 'B')], [(1.0, 'C'), (0.5, 'D')], [(-0.7, 'E')]]

我試圖比較這兩種方法的性能。 一個帶有迭代器的(使用下一個):

import timeit

tuples = [(0.8, "a"), (-0.4, "b"), (1.0, "c"), (0.5, "d"), (-0.7, "e")]
nums = [2, 2, 1]

def collect(tuples, nums):
    tuples_iter = iter(tuples)
    res = []
    for num in nums:
         batch = [next(tuples_iter) for i in range(num)]
         res.append(batch)
    return res

times = 1000000
res = timeit.timeit('collect(tuples, nums)', globals=globals(), number=times)
print(res, res/times)

結果它顯示了時間:

$ py3 test.py
1.6941756070009433 1.6941756070009433e-06

另一個,可以直接訪問 dict 值:

import timeit

tuples = [(0.8, "a"), (-0.4, "b"), (1.0, "c"), (0.5, "d"), (-0.7, "e")]
nums = [2, 2, 1]

def collect(x, lns):
    res = []
    start = 0
    for ln in lns:
        res.append(x[start:start+ln])
        start += ln
    return res

times = 1000000
res = timeit.timeit('collect(tuples, nums)', globals=globals(), number=times)
print(res, res/times)

然后我有運行時:

$ py3 test2.py
0.6738436879822984 6.738436879822985e-07

即使元組列表很小,我們也可以看到性能上的明顯差異。 讓我們通過將初始條件替換為:

tuples = [(0.8, "a") for i in range(10000)]
nums = [2 for i in range(5000)]

(無論是相同的元組還是不同的元組都沒有關系,為簡單起見,我生成了相同大小的模式)

另外,我將“運行”次數減少到 10000,否則需要等待一段時間。 所以這是結果:

$ py3 test.py
45.54160467500333 0.0045541604675003325
$ py3 test2.py
18.69937555701472 0.001869937555701472

看起來“直接訪問”的解決方案要快 3 倍。

更新

好吧,要測試“pop”方法,我們必須在每次運行測試時重新實例化 state。 為了保持一致性,我在同一框架內實現了所有 3 種方法:

from timeit import default_timer as timer


def prepare():
    tuples = [(0.8, "a") for i in range(10000)]
    nums = [2 for i in range(5000)]
    return tuples, nums


def collect_direct(x, lns):
    res = []
    start = 0
    for ln in lns:
        res.append(x[start:start+ln])
        start += ln
    return res


def collect_iter(tuples, nums):
    tuples_iter = iter(tuples)
    res = []
    for num in nums:
        batch = [next(tuples_iter) for i in range(num)]
        res.append(batch)
    return res


def collect_pop(source, pattern):
    res = []
    for size in pattern:
        # pop(0) takes the first element out of the list
        res.append([source.pop(0) for x in range(size)])
    return res


def test(func, times):
    total = 0
    for i in range(times):
        tuples, nums = prepare()
        start = timer()
        func(tuples, nums)
        total += timer() - start
    return total


times = 1000

print("Times", times)

print("Iter")
res = test(collect_iter, times)
print(res, res/times)

print("Direct")
res = test(collect_direct, times)
print(res, res/times)

print("Pop")
res = test(collect_pop, times)
print(res, res/times)

這是 output:

$ py3 test4.py
Times 1000
Iter
2.9237239362555556 0.0029237239362555558
Direct
1.5656779609271325 0.0015656779609271325
Pop
22.700828287401237 0.022700828287401238

因為 pop 不僅期望在復雜性上類似於“迭代”方法(我們訪問初始列表的每個元素),而且還必須從列表中刪除第一個元素。

我們從實驗中了解到,python 列表並不是真正的列表,在直接訪問上具有很高的性能。 因此,我希望從這樣的列表中彈出元素可能會破壞必須重新平衡的“索引”。

結論:直接訪問是贏家。 此外,對於包含更大塊的模式,它將是更大的贏家。 例如模式 [3,3,3...] 與“iter”相比,我希望“直接訪問”的運行速度快 3 倍,而 [5, 5, 5, ...] 相應地快 5 倍(雖然我沒有檢查這個)。

遍歷您的“模式”並使用您的“長列表”:

source = [(0.8, "A"), (-0.4, "B"), (1.0, "C"), (0.5, "D"), (-0.7, "E")]
pattern = [2, 2, 1]
res = []

for size in pattern:
    # pop(0) takes the first element out of the list
    res.append([source.pop(0) for x in range(size)])

print(res)

單線:

res = [[source.pop(0) for x in range(size)] for size in pattern]

出去:

[[(0.8, 'A'), (-0.4, 'B')], [(1.0, 'C'), (0.5, 'D')], [(-0.7, 'E')]]

我只是一個列表理解:

source =[(0.8, 'A'), (-0.4, 'B'), (1.0, 'C'), (0.5, 'D'), (-0.7, 'E')]
lns = [2,2,1]

res = [[source.pop(0) for _ in range(ln)] for ln in lns]
print(res)

Output:

[[(0.8, 'A'), (-0.4, 'B')], [(1.0, 'C'), (0.5, 'D')], [(-0.7, 'E')]]

警告:此解決方案source ,這意味着在創建ressource將是一個空列表。 如果您不希望發生這種情況,則必須先創建列表的副本。

在列表上創建一個迭代器,每次只需拉出指示的數量:

l = [(0.8, 'A'), (-0.4, 'B'), (1.0, 'C'), (0.5, 'D'), (-0.7, 'E')]
sizes = [2, 2, 1]

it = iter(l)
res = [[next(it) for _ in range(size)] for size in sizes]
print(res)

會給:

[[(0.8, 'A'), (-0.4, 'B')], 
 [(1.0, 'C'), (0.5, 'D')], 
 [(-0.7, 'E')]]

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM