简体   繁体   English

numpy arrays 使用的 Memory 比 RAM 大吗?

[英]Memory used by numpy arrays larger than RAM?

I have read very large tdms files containing sensor data into lists of numpy arrays.我已将包含传感器数据的非常大的 tdms 文件读入 numpy arrays 列表中。 The structure is the following: The data from several files is stored in instances of an object called file_data.结构如下: 来自多个文件的数据存储在名为 file_data 的 object 的实例中。 The object has properties for each sensor type which are basically lists of numpy arrays (one for each single sensor of that sensor type). object 具有每种传感器类型的属性,这些属性基本上是 numpy arrays 列表(该传感器类型的每个单个传感器一个)。

I wanted to know how much data I store here (since the size of the tdms files generated by Labview seemed not very meaningful, with all the metadata).我想知道我在这里存储了多少数据(因为 Labview 生成的 tdms 文件的大小似乎不是很有意义,包含所有元数据)。

This is the code:这是代码:

# Check memory
total = 0
file_data = [file_data1, file_data2, ...] # list of data objects read from six files
for no, f in enumerate(file_data):
    sensor_types = [f.sensortype1, f.sensortype2, ...] # list of sensor types
    sum = 0
    for sensor_type in sensor_types: # list
        for data in sensor_type: #np.array
            sum += (data.size * data.itemsize)
    total += sum
    print('Data from file {}, size: {:.2f} GB'.format(no+1, sum/(1024**3))) 
print('Total memory: {:.2f} GB'.format(total/(1024**3)))

Now this gives me the following output:现在这给了我以下 output:

Data from file 1, size: 2.21 GB来自文件 1 的数据,大小:2.21 GB

Data from file 2, size: 1.88 GB来自文件 2 的数据,大小:1.88 GB

Data from file 3, size: 2.27 GB来自文件 3 的数据,大小:2.27 GB

Data from file 4, size: 1.53 GB来自文件 4 的数据,大小:1.53 GB

Data from file 5, size: 1.01 GB来自文件 5 的数据,大小:1.01 GB

Data from file 6, size: 0.66 GB来自文件 6 的数据,大小:0.66 GB

Total memory: 9.56 GB总 memory:9.56 GB

But I am working on a 8GB RAM Mac, so this number really surprised me, since the program didn't crash and I can work with the data.但我正在使用8GB RAM的 Mac,所以这个数字真的让我感到惊讶,因为程序没有崩溃,我可以处理数据。 Where am I mistaken?我错在哪里?

I guess you use npTDMS .我猜你使用npTDMS

The used numpy.array type is not just a simple array where all array elements are always stored in memory.使用的numpy.array类型不仅仅是一个简单的数组,其中所有数组元素始终存储在 memory 中。 While the data type and number of elements is known (by reading meta data from the TDMS file, in this case), the elements are not read until requested.虽然数据类型和元素数量是已知的(在这种情况下,通过从 TDMS 文件中读取元数据),但在请求之前不会读取元素。

That is: If you want the last element of a 20GB record, npTDMS knows where it is stored in the file, reads and returns it - without reading the first 20GB.也就是说:如果您想要 20GB 记录的最后一个元素,npTDMS 知道它在文件中的存储位置,读取并返回它 - 无需读取前 20GB。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM