简体   繁体   English

有效地存储和读取大数据文件

[英]Storing and reading large data files efficiently

I am working on a project where I have large input files which come from numerical solutions of pdes. 我正在开发一个项目,我有大量的输入文件来自pdes的数值解决方案。 The format of the data is as follows. 数据格式如下。

x \t y \t f(x,y)

For each value of y, we have several values of x, and the function value evaluated at each point. 对于y的每个值,我们有几个x值,并在每个点评估函数值。 The size of the data I'm dealing with is about [-3, 5]x[-3, 5] in steps of 0.01 in each dimension, so the raw data file is pretty big (about 640,000 entries). 我正在处理的数据大小约为[-3, 5]x[-3, 5] ,每个维度的步长为0.01 ,因此原始数据文件非常大(大约640,000个条目)。 Reading it into memory is also pretty time-taking because the tools I'm working on have to read multiple raw data files of this type at the same time. 将其读入内存也非常耗时,因为我正在使用的工具必须同时读取这种类型的多个原始数据文件。

I'm using Python. 我正在使用Python。

Is there any way to store and read data like this efficiently in Python? 有没有办法在Python中有效地存储和读取这样的数据? The idea is to include a tool that massages these raw data files into something that can be read more efficiently. 我们的想法是包含一个工具,可以将这些原始数据文件按摩成可以更有效地读取的内容。 I'm currently working on interpolating the data and storing some coefficients (essentially replacing memory by computing time), but I'm sure there must be an easier way that helps both memory and time. 我正在研究内插数据和存储一些系数(基本上通过计算时间替换内存),但我确信必须有一种更简单的方法来帮助记忆和时间。

Thanks SOCommunity! 谢谢SOCommunity!

PS: I saw related questions in Java. PS:我在Java中看到了相关的问题。 I'm working entirely on Python here. 我在这里完全使用Python。

If you're using numpy (and you probably should be), numpy.save / numpy.savez and numpy.load should be able to handle this pretty easily. 如果你正在使用numpy(你可能应该), numpy.save / numpy.saveznumpy.load应该能够很容易地处理这个问题。

For example: 例如:

import numpy as np
xs = np.linspace(-3, 5, 800)
ys = np.linspace(-3, 5, 800)
f_vals = np.random.normal(size=(xs.size, ys.size))
np.savez('the_file.npz', xs=xs, ys=ys, f=f_vals)

is quite quick, and the resulting file is less than 5mb. 非常快,结果文件小于5mb。

Is there any way to store and read data like this efficiently in Python? 有没有办法在Python中有效地存储和读取这样的数据?

If you don't need to keep it in memory all the time, I would suggest migrating the data to an Sqlite database. 如果您不需要一直将其保留在内存中,我建议将数据迁移到Sqlite数据库。 This would also allow you making SQL queries on the data. 这也允许您对数据进行SQL查询。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM