简体   繁体   English

Flatbuffers 可以利用向量中的 0 吗? 还是其他小波比 Haar 变换更好?

[英]Can Flatbuffers take advantage of 0's in vectors? Or are other wavelets better than the Haar transform?

I'm serializing some data and want to make the file size as small as possible without losing the essential details of the data.我正在序列化一些数据并希望在不丢失数据的基本细节的情况下使文件大小尽可能小。 The first step for me was to save the data in a binary format instead of ASCII and I decided to try Flatbuffers.对我来说,第一步是以二进制格式而不是 ASCII 格式保存数据,我决定尝试使用 Flatbuffers。 Previously when the data were stored as text files, they were about 400 mb.以前,当数据存储为文本文件时,它们大约为 400 mb。 Using the schema shown below, the file is about 200 mb.使用下面显示的模式,文件大约为 200 mb。 So that's a nice decrease in size, but smaller would of course be better.所以这是一个很好的尺寸减小,但更小当然会更好。 The data consist of 1 of the ControlParams, 82 of the ControlData, and the intensity vector takes up most of the space, being a matrix with a size of about 128x5000.数据由1个ControlParams,82个ControlData组成,强度向量占据了大部分空间,是一个大小约为128x5000的矩阵。 We are already around the theoretical binary size of 128x5000*82 * 4 bytes per float ~ 200 mb.我们已经接近理论二进制大小 128x5000*82 * 4 字节每个浮点 ~ 200 mb。 The matrices are pretty dense in general, but here and there I can see rows that are zero.矩阵通常非常密集,但在这里和那里我可以看到为零的行。 Can Flatbuffers take advantage of these zeros to reduce the file size further? Flatbuffers 可以利用这些零来进一步减小文件大小吗? Perhaps there are other inefficiencies that someone can spot in the schema, since I just am getting started with Flatbuffers?也许有人可以在模式中发现其他低效率,因为我刚刚开始使用 Flatbuffers?

Another way to go about reducing the file size might be investigating different wavelets to compress the original intensities with.另一种减小文件大小的方法可能是研究不同的小波来压缩原始强度。 I'm using the Haar transform now because I was able to make a C++ function to do this, and found that a compression of 2x or possibly 4x was possible.我现在正在使用 Haar 变换,因为我能够创建一个 C++ 函数来执行此操作,并发现可以进行 2 倍或可能 4 倍的压缩。 I might like to investigate other wavelets, but would like to know if others have tried different wavelets compared to Haar and found they were able to use fewer coefficients with them.我可能想研究其他小波,但想知道其他人是否尝试过与 Haar 不同的小波,并发现他们能够使用更少的系数。

namespace RTSerialization;

table ControlParams{
    extractStepSizeDa:float = 1.0005;
    smooth:bool = false;
    haarLevel:int = 10;
    deltaTimeSec:float;
}

table ControlData{
    mzAxis:[float];
    timeSec:[float];
    intensities:[float];
    scanFilter:string;
}

table ControlParamsAndData{
    params:ControlParams;
    dataSet:[ControlData];
}

root_type ControlParamsAndData;

Yes, your size is entirely determined by a single float array, the rest of the FlatBuffer format is entirely irrelevant to the question of how to make this smaller.是的,您的大小完全由单个float组决定,FlatBuffer 格式的其余部分与如何将其变小的问题完全无关。

And no, FlatBuffers doesn't do any form of automatic compression, since the design is all about random access.不,FlatBuffers 不做任何形式的自动压缩,因为设计完全是关于随机访问的。 Any access to your float array should be O(1).对浮点数组的任何访问都应该是 O(1)。

So optimizing this data comes entirely down to you.因此,优化这些数据完全取决于您。 You say the data is matrices.. floats in matrices are often in limited ranges like -1 to 1, so could be quantized into a short ?你说数据是矩阵......矩阵中的浮点数通常在 -1 到 1 等有限范围内,所以可以量化为一个short ?

Other forms of compression of course mean you'd have to do your own packing/unpacking.当然,其他形式的压缩意味着您必须自己打包/解包。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM