[英]Is it better to store temp data in arrays or save it to file for access later?
This is a broad question. 这是一个广泛的问题。 I am running a very long simulation (in Python) that generates a sizeable amount of data (about 10,000 729*729 matrices). 我正在运行一个很长的模拟(在Python中),该模拟会生成大量数据(大约10,000 729 * 729矩阵)。 I only need the data to plot a couple of graphs and then I'm done with it. 我只需要数据就可以绘制几个图,然后就完成了。 At the moment I save the data in (numpy) arrays. 目前,我将数据保存在(numpy)数组中。 When the simulation is complete I plot the data. 模拟完成后,我将绘制数据。
One alternative would be to write the data to a file, and then access the file after simulation to plot graphs etc. 一种替代方法是将数据写入文件,然后在仿真后访问该文件以绘制图形等。
In general is there consensus on the best (ie quickest) way to manage large temporary data sets. 通常,对于管理大型临时数据集的最佳(即最快)方法已达成共识。 Is either of these "best practice"? 这些“最佳实践”中的任何一个?
Try to make the data obsolete as fast as possible by further processing/accumulating eg plotting immediately. 尝试通过进一步处理/累积(例如立即绘制)来使数据尽可能快地过时。
You did not give details about the memory/storage needed. 您没有提供有关所需内存的详细信息。 for sparse matrices there are efficient representations. 对于稀疏矩阵,存在有效的表示。 if your matrices are not sparse there are roughly 500k entries per matrix and therefore 5G entries altogether. 如果矩阵不稀疏,则每个矩阵大约有500k条目,因此总共有5G条目。 without knowing your data type this could be typically 40GB of memory. 不知道您的数据类型,通常可能是40GB的内存。
I strongly suggest to review your algorithms for achieving a smaller memory footprint. 我强烈建议您查看算法以减少内存占用。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.