简体   繁体   中英

Does casting a list into tuple result in tuple having memory overhead?

Let's create a lists-of-lists:

l = []
for i in range(1000000):
    l.append(['abcdefghijklmnopqrstuvwxyz', '1.8', '5', 'john@email.com', 'ffffffffff'])

l.__sizeof__() reports 8697440 and process occupies 111 MB .

Now let's try list-of-tuples instead of lists-of-lists:

l = []
for i in range(1000000):
    l.append(('abcdefghijklmnopqrstuvwxyz', '1.8', '5', 'john@email.com', 'ffffffffff'))

l.__sizeof__() reports 8697440 and process occupies 12 MB .

As expected, huge improvement. Now let's cast the list into tuples just before insertion instead:

l = []
for i in range(1000000):
    l.append(tuple(['abcdefghijklmnopqrstuvwxyz', '1.8', '5', 'john@email.com', 'ffffffffff']))

l.__sizeof__() reports 8697440 and process occupies 97 MB .

Why 97 MB, and not 12 MB ??

Is it garbage? gc.collect() reports 0 .

Doing a del l releases all the memory down to 4 MB , which is just a little over what a blank interpreter occupies. So, l was actually bloated.


When we cast a list into tuple, is the resulting tuple "impure" compared to tuple created from scratch ?


And if so, is there a workaround to convert list into "pure" tuple?

Short answer

No. Your line of reasoning is flawed because your are not realising that tuple s use Constant Folding .
(Hint: No way tuples can hold in 12MB what lists are holding in 111MB !!)

Long answer

Your first example creates a million and one lists. Your third example creates one list and a million tuples. The slight memory difference is due to the fact that tuples are more compactly stored (they don't keep any extra capacity to handle appends, for example).

But in your second example, you have one list, and ONE tuple - it's a constant value, so it gets created once during compilation of the code, and references to that one object get reused when setting each element of the list. If it wasn't constant (for example, if you used i as one of the elements of the tuple), a new tuple would have to be created each time, and you'd get comparable memory usage to example #3.

The third example could theoretically have the same memory usage as #2, but this would require that Python compare each newly-created tuple to all existing tuples to see if they happen to be identical. This would be slow, and of extremely limited benefit in typical real-world programs that don't create huge numbers of identical tuples.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM