简体   繁体   English

将 int 列表转换为字符串列表需要太多 Python 中的 memory

[英]Converting a list of int to a list of string takes too much memory in Python

I need to store integers as a string.我需要将整数存储为字符串。 Eg.例如。 - [1,2,3] will be stored as '1;2;3'. - [1,2,3] 将存储为 '1;2;3'。 For doing this I need to first convert the list of integers to a list of strings.为此,我需要首先将整数列表转换为字符串列表。 But the memory usage for this conversion is huge.但是这种转换的 memory 使用量很大。

The sample code to show the problem.显示问题的示例代码。

from sys import getsizeof
import tracemalloc

tracemalloc.start()

curr, peak = tracemalloc.get_traced_memory()
print((f'Current: {round(curr/1e6)} MB\nPeak: {round(peak/1e6)} MB'))

print()

list_int = [1]*int(1e6)

curr, peak = tracemalloc.get_traced_memory()
print((f'Current: {round(curr/1e6)} MB\nPeak: {round(peak/1e6)} MB'))
print(f'Size of list_int: {getsizeof(list_int)/1e6} MB')

print()

list_str = [str(i) for i in list_int]

curr, peak = tracemalloc.get_traced_memory()
print((f'Current: {round(curr/1e6)} MB\nPeak: {round(peak/1e6)} MB'))
print(f'Size of list_str: {getsizeof(list_str)/1e6} MB')

Output: Output:

Current: 0 MB
Peak: 0 MB

Current: 8 MB
Peak: 8 MB
Size of list_int: 8.000056 MB

Current: 66 MB
Peak: 66 MB
Size of list_str: 8.448728 MB

The memory taken by both lists is similar (8 MB), but the memory used by the program during conversion is huge (66 MB).两个列表采用的 memory 相似(8 MB),但程序在转换过程中使用的 memory 很大(66 MB)。

How can I solve this memory issue?我该如何解决这个 memory 问题?

Edit: My need is to convert it to a string, so I will run ';'.join(list_str) in the end.编辑:我需要将其转换为字符串,所以我将运行';'.join(list_str)最后。 So,, even if I use a generator/iterable let's say list_str = map(str, list_int) , the memory usage comes out to be same.所以,即使我使用生成器/可迭代,比如说list_str = map(str, list_int) ,memory 的用法也是一样的。

Use Numpy instead.请改用 Numpy。 Try this尝试这个

from sys import getsizeof
import tracemalloc
import numpy as np

tracemalloc.start()

arr = np.ones((1000000,), dtype=np.str)
for i in [1]*int(1e6):
    arr[i] = str(i)

curr, peak = tracemalloc.get_traced_memory()
print((f'Current: {round(curr/1e6)} MB\nPeak: {round(peak/1e6)} MB'))
print(f'Size of list_str: {getsizeof(list(arr))/1e6} MB')

Output with bit improvement I Think Output 我认为有一点改进

Current: 4 MB
Peak: 12 MB
Size of list_str: 9.000112 MB

I think that the result Size of list_str: 8.448728 MB is misleading;我认为结果Size of list_str: 8.448728 MB具有误导性; the true size of list_str is actually larger: 58.70MB. list_str真实大小实际上更大:58.70MB。

If you read the doc about getsizeof carefully, you will find the following:如果您仔细阅读有关getsizeof文档,您会发现以下内容:

Only the memory consumption directly attributed to the object is accounted for, not the memory consumption of objects it refers to .仅考虑直接归因于 object 的 memory 消耗,而不考虑它所指对象的 memory 消耗 [emphasis by me] [我强调]

That is, getsizeof does not count the size of the contents, ie, the strings in the list.也就是说, getsizeof不计算内容的大小,即列表中的字符串。

Using the proposed method introduced therein, you can find that the true total size of the list [str(i) for i in [1] * int(1e6)] is about 58.70MB.使用其中介绍的建议方法,您可以发现列表[str(i) for i in [1] * int(1e6)]的真实总大小约为 58.70MB。

Now add this to the total size (8MB) of the other list in your hands, [1] * int(1e6) , and you will get the number 66MB that you observe.现在将此添加到您手中的另一个列表的总大小 (8MB) [1] * int(1e6)中,您将得到您观察到的数字66MB

Therefore my answer is that as long as you want to have the list of strings, there is no better way to do, since actually no excessive memory is exploited along the way.因此我的回答是,只要你想拥有字符串列表,就没有更好的办法了,因为实际上没有过多的 memory 被利用。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM