简体   繁体   English

编码数据的最佳方法

[英]Best way to encode data

I have huge amount of data in my database whith format : 我的数据库有大量数据,格式如下:

lat;lon;speed;sec:lat;lon;speed;sec......

for example: 例如:

 53.284534;50.227268;67;0:53.285481;50.226627;68;6:53.286429;50.226042;66;12:.......

format is lattitude, longitude, speed, number of second from beginning. 格式为纬度,经度,速度,从开始算起的秒数。 length of each string is from 1000 to 100000. I try to compress it before putting in database via gzcompress() and base64_encode(). 每个字符串的长度从1000到100000。在尝试通过gzcompress()和base64_encode()放入数据库之前,我尝试对其进行压缩。 In case of length of initial string 7607 symbols after gzcompress and base64_encode it will be 3444, so compression is 50%. 如果gzcompress和base64_encode之后的初始字符串7607个符号的长度为3444,则压缩率为50%。 Is any more effective way to compress strings like this? 有没有更有效的方式来压缩这样的字符串?

Try just storing them as binary floats. 尝试仅将它们存储为二进制浮点数。 This is very simple and it's very fast. 这非常简单而且非常快速。 Each number would use 4 bytes and that would make it possible to use them directly from within your code. 每个数字将使用4个字节,这将有可能直接在您的代码中使用它们。

Or if you need them more precise, multiply each component by a pre-definied value (which may differ for each component), and store as 32-bit integer words. 或者,如果您需要更精确的值,则将每个组件乘以预定值(每个组件可能有所不同),然后存储为32位整数字。

There is clearly a strong correlation from sample to sample. 显然,样品之间存在很强的相关性。 I would subtract from each sample the previous sample, except of course for the first one. 我将从每个样本中减去先前的样本,当然第一个样本除外。 I would encode each difference as an integer of variable length (not as text but in binary). 我会将每个差异编码为可变长度整数 (不是文本形式,而是二进制形式)。 For lat and long I would multiply by 1,000,000 on the assumption (which you need to verify) that there are never more than six digits after the decimal. 对于纬度和经度,我将在假设(您需要验证)上乘以1,000,000,即小数点后不得超过六位数。 The second and third samples would each require only six bytes. 第二个和第三个样本每个仅需要六个字节。

Then would I compress with gzip. 然后用gzip压缩。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM