[英]how to handle large amount of float data?
We have a binary file which contains a large amount of float
data (about 80MB). 我们有一个包含大量
float
数据(大约80MB)的二进制文件。 we need to process it in our Java application. 我们需要在Java应用程序中处理它。 The data is from a medical scanner.
数据来自医疗扫描仪。 One file contains data from one
Rotation
. 一个文件包含一个
Rotation
数据。 One Rotation
contains 960 Views
. 一个
Rotation
包含960个Views
。 One View
contains 16 Rows
and one Rows
contains 1344 Cells
. 一个
View
包含16 Rows
和一个Rows
包含1344个Cells
。 Those numbers (their relationship) are fixed. 这些数字(他们的关系)是固定的。
We need to read ALL the floats into our application with a code structure reflect above structure about Rotation-view-row-cell
. 我们需要将所有浮点数读入我们的应用程序,其代码结构反映了关于
Rotation-view-row-cell
上述结构。
What we are doing now is using float[]
to hold data for Cells
and then using ArrayList
for Rotation
, View
and Row
to hold their data. 我们现在正在做的是使用
float[]
来保存Cells
数据,然后使用ArrayList
for Rotation
, View
和Row
来保存它们的数据。
I have two questions: 我有两个问题:
Assuming you don't make changes to the data (add more views, etc.) why not put everything in one big array? 假设您没有对数据进行更改(添加更多视图等),为什么不将所有内容放在一个大数组中? The point of ArrayLists is you can grow and shrink them, which you don't need here.
ArrayLists的重点是你可以增长和缩小它们,这里你不需要它们。 You can write access methods to get the right cell for a given view, rotation, etc.
您可以编写访问方法以获取给定视图,旋转等的正确单元格。
Using arrays of arrays is a better idea, that way the system is figuring out how to access what for you and it is just as fast as a single array. 使用数组数组是一个更好的主意,这样系统就可以确定如何为您访问什么,它就像单个数组一样快。
Michael is right, you need to buffer the input, otherwise you will be doing a file access operation for every byte and your performance will be awful. 迈克尔是对的,你需要缓冲输入,否则你将对每个字节进行文件访问操作,你的性能会很糟糕。
If you want to stick with the current approach as much as possible, you can minimize the memory used by your ArrayLists by setting their capacity to the number of elements they hold. 如果您希望尽可能地坚持当前的方法,可以通过将其容量设置为它们所容纳的元素数来最小化ArrayLists使用的内存。 Otherwise they keep a number of slots in reserve, expecting you to add more.
否则他们会保留许多插槽,期望您添加更多插槽。
DataInputStream
(and its readFloat()
method) wrapping a FileInputStream
, possibly with e BufferedInputStream
in between (try whether the buffer helps performance or not). DataInputStream
(及其readFloat()
方法)包装FileInputStream
,可能在其间使用e BufferedInputStream
(尝试缓冲区是否有助于提高性能)。 Are you having any particular performance/usage problems with your current approach? 您当前的方法是否有任何特定的性能/使用问题?
The only thing I can suggest based on the information that you provide is to try representing a View as float[][] of rows and cells. 根据您提供的信息,我唯一可以建议的是尝试将View表示为行和单元格的float [] []。
For the data loading: 对于数据加载:
DataInputStream should work well. DataInputStream应该可以正常工作。 But make sure you wrap the underlying FileInputStream in a BufferedInputStream, otherwise you run the risk of doing I/O operations for every float which can kill performance.
但请确保将基础FileInputStream包装在BufferedInputStream中,否则您将面临为每个可能会降低性能的浮动执行I / O操作的风险。
Several options for holding the data: 保存数据的几个选项:
I also think that you can put all your data structure into a float[][][]
(same as Nathan Hughes suggests). 我还认为你可以将所有数据结构放入
float[][][]
(与Nathan Hughes建议的相同)。 You could have a method that reads your file and return a float[][][]
, where the first dimension is that of views (960), the second is that of rows (16), and the third is that of cells (1344): if those numbers are fixes, you'd better use this approach: you save memory, and it's faster. 你可以有一个方法来读取你的文件并返回一个
float[][][]
,其中第一个维度是视图的维度(960),第二个维度是行(16),第三个维度是细胞的维度( 1344):如果这些数字是修复,你最好使用这种方法:你节省内存,而且速度更快。
80 MB shouldn't be so much data that you need to worry so terribly much. 80 MB应该不是那么多你需要担心的数据。 I would really suggest:
我真的建议:
我知道你正在寻找上面描述的商店数据的有效方式,虽然你提到的尺寸不是很大我建议你看看巨大的收藏品。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.