擴展對子列表平面列表的列表理解

Question

鑒於old_list = list(range(1000)) ，有兩種方法可以從我熟悉的多個子列表中創建一個新的平面列表：

使用擴展使新列表自動平坦。 MWE：

new_list = []
for i in range(10,20):
    new_list.extend(old_list[:i])

使用列表理解，然后返回並展平。 MWE：

new_list = [old_list[:i] for i in range(10,20)]
new_list = [item for sublist in new_list for item in sublist]

我很困惑，對於較長的列表，哪些更有效，以及是否有任何方法可以使用比另一個更少的內存。 似乎后者更像 Pythonic，但我不喜歡回過頭來進行扁平化，而且我還沒有發現關於extend開銷的太多討論（而append有很多）。

Answer 1

實用答案

@juanpa.arrivillaga 對問題的診斷完全正確； Python 作為一種智能語言，不會為這種類型的操作增加內存，因為它只是用內存中的相同對象填充新列表。 我使用PyPI提供的memory-profiler包檢查了這一點，列表的元素是更相關的ndarray對象：

Line #    Mem usage    Increment  Occurences   Line Contents
============================================================
    26     27.8 MiB     27.8 MiB           1   @profile
    27                                         def double_listcomp_test():
    28   1904.3 MiB   1876.6 MiB       10003       old_list = [a for a in
    29   1904.3 MiB      0.0 MiB           1                   np.random.randint(0,255,(10000,256,256,3),dtype=np.uint8)]
    30   1904.4 MiB      0.1 MiB        5008       old_list = [a for _ in range(5) for a in random.sample(old_list,k=1000)]
    31
    32   1904.4 MiB      0.0 MiB           1       print(len(old_list))
    33   1904.4 MiB      0.0 MiB           1       return old_list

這種方法對我來說非常有效，而且我認為對於習慣使用生成器表達式的 Python 程序員來說，它至少與其他兩種方法一樣具有可讀性。 我們本質上只是將子列表的集合視為迭代器而不是它自己的列表。 我對上述示例的“接受”方法是：

my_list = list(range(1000))
my_list = [item for i in range(10,20) for item in my_list[:i]]

用於內存管理

我實際上發現我需要做的是將數組的副本復制到該列表中，以便我可以刪除更大的舊列表。 這實際上不在問題的范圍內，因為它顯然會在創建時導致更大的膨脹，但我想包括它，因為這個問題是導致我來到這里的原因。 在這種情況下，我認為評論的建議特別正確，因為它避免了 unPythonically del old_list （見下文）。

兩個要展平的 listcomps（如所討論的那樣）：

Line #    Mem usage    Increment  Occurences   Line Contents
============================================================
    16     27.7 MiB     27.7 MiB           1   @profile
    17                                         def listcomp_test():
    18   1904.3 MiB   1876.6 MiB       10003       old_list = [a for a in
    19   1904.3 MiB      0.0 MiB           1                   np.random.randint(0,255,(10000,256,256,3),dtype=np.uint8)]
    20   1904.4 MiB      0.1 MiB           8       new_list = [random.sample(old_list,k=1000) for _ in range(5)]
    21   2842.6 MiB    938.3 MiB        5008       new_list = [item.copy() for sublist in new_list for item in sublist]
    22
    23   2842.7 MiB      0.0 MiB           1       print(len(new_list))
    24   2842.7 MiB      0.0 MiB           1       return new_list

兩個 listcomp 與del old_list ：

Line #    Mem usage    Increment  Occurences   Line Contents
============================================================
    16     27.7 MiB     27.7 MiB           1   @profile
    17                                         def listcomp_test():
    18   1904.3 MiB   1876.6 MiB       10003       old_list = [a for a in
    19   1904.3 MiB      0.0 MiB           1                   np.random.randint(0,255,(10000,256,256,3),dtype=np.uint8)]
    20   1904.4 MiB      0.1 MiB           8       new_list = [random.sample(old_list,k=1000) for _ in range(5)]
    21   2842.7 MiB    938.3 MiB        5008       new_list = [item.copy() for sublist in new_list for item in sublist]
    22    967.1 MiB  -1875.5 MiB           1       del old_list
    23
    24    967.2 MiB      0.0 MiB           1       print(len(new_list))
    25    967.2 MiB      0.0 MiB           1       return new_list

單個 listcomp（如評論中所示）：

Line #    Mem usage    Increment  Occurences   Line Contents
============================================================
    26     27.8 MiB     27.8 MiB           1   @profile
    27                                         def double_listcomp_test():
    28   1904.3 MiB   1876.6 MiB       10003       my_list = [a for a in
    29   1904.3 MiB      0.0 MiB           1                   np.random.randint(0,255,(10000,256,256,3),dtype=np.uint8)]
    30   2842.6 MiB   -937.2 MiB        5008       my_list = [a.copy() for _ in range(5) for a in random.sample(my_list,k=1000)]
    31
    32    967.1 MiB  -1875.5 MiB           1       print(len(my_list))
    33    967.1 MiB      0.0 MiB           1       return my_list

注意：如果您關心內存管理，Python 肯定不是最好的語言，但在數據科學應用程序中，我們通常沒有選擇，所以我認為將其作為答案是謹慎的。 在每種情況下，Python 在創建新列表時都必須占用額外的內存，但是通過使用建議的 listcomp 方法，我們可以避免不必要地將東西分配給新變量，因此我們盡可能少地保留它。

擴展對子列表平面列表的列表理解

問題描述

1 個解決方案

解決方案1
1 已采納 2020-11-23 22:02:08

實用答案

用於內存管理

擴展對子列表平面列表的列表理解

問題描述

1 個解決方案

解決方案1 1 已采納 2020-11-23 22:02:08

實用答案

用於內存管理

解決方案1
1 已采納 2020-11-23 22:02:08