简体   繁体   English

当我们对数据结构有所了解时,是否有更有效的压缩数字串的方法?

[英]Is there a more efficient way of compressing strings of digits when we know something about the structure of the data?

We have loyalty cards (like credit/debit cards, but processed by our bespoke code, as opposed to ones processed by interfacing with the banks).我们有会员卡(如信用卡/借记卡,但由我们的定制代码处理,而不是通过与银行接口处理的卡)。 We need to store transaction data on the cards, as many transactions will be made using offline devices, and only uploaded when the card is next tapped on an online terminal.我们需要将交易数据存储在卡上,因为许多交易将使用离线设备进行,并且只有在下次在在线终端上刷卡时才会上传。

Card storage space if limited (typically max 8Kb unless you pay silly prices for very smart cards), so I need to compress the data as much as possible.如果卡存储空间有限(通常最大 8Kb,除非你为非常智能的卡支付愚蠢的价格),所以我需要尽可能地压缩数据。

Our transaction data is made up of three parts, all of which involve digits only (ie not alphabetic or special characters)...我们的交易数据由三部分组成,所有部分都只涉及数字(即不包括字母或特殊字符)...

  • Date/time - in the format yyMMddhhmmssfff日期/时间 - 格式yyMMddhhmmssfff
  • Device serial number - 17 digits设备序列号 - 17 位
  • Amount - In pennies, max £999.99, so five digits金额 - 以便士为单位,最高 999.99 英镑,即五位数

Representing this as a string of digits gives 37 digits per transaction.将其表示为一串数字给出每笔交易 37 位数字。

I tried using the algorithms in System.IO.Compression (following the code in this blog post , and the accompanying GitHub repo , not included here as it's bog-standard usage of the classes).我尝试使用System.IO.Compression中的算法(遵循这篇博文中的代码,以及随附的 GitHub 存储库,此处未包括在内,因为它是类的标准用法)。

This gave some quite impressive results, with around 72% reduction using the optimal Gzip algorithm.这给出了一些非常令人印象深刻的结果,使用最佳 Gzip 算法减少了大约 72%。

However, I was wondering if it would be possible to improve on this, given that we know something about the shape of the transaction data.但是,我想知道是否有可能对此进行改进,因为我们对交易数据的形状有所了解。 For example, the date/time part of the data breaks down as follows...例如,数据的日期/时间部分分解如下...

  • year - not that much restriction here年 - 这里没有那么多限制
  • month - can only be 1-12月 - 只能是 1-12
  • day - can only be 1-31 day - 只能是 1-31
  • hour - can only be 0-23小时 - 只能是 0-23
  • minutes and seconds - can only be 0-59分和秒 - 只能是 0-59
  • milliseconds - no restriction毫秒 - 无限制

Anyone any comment of whether or not these restrictions would help help me improve on this compression.任何人对这些限制是否有助于我改进这种压缩有任何评论。 Thanks谢谢

We can compress the data into 118 bit (or 15 bytes).我们可以将数据压缩成118位(或15字节)。 So far so good we have ranges:到目前为止一切顺利,我们有范围:

  • Date and Time: 1 Jan 2000 0:0:0.000 up to 1 Jan 2100 0:0:0.000 which is 3_155_760_000_000 milliseconds日期和时间: 1 Jan 2000 0:0:0.0001 Jan 2100 0:0:0.000 ,即3_155_760_000_000毫秒
  • Serial number: 1_000_000_000_000_000_000 possible numbers序列号: 1_000_000_000_000_000_000可能的数字
  • Amount: 1_000_00 in pennies金额: 1_000_00美分

So we have in total:所以我们总共有:

double dt = (new DateTime(2100, 1, 1) - new DateTime(2000, 1, 1)).TotalMilliseconds;
double sn = 1_000_000_000_000_000_000L;
double amount = 1_000_00;

Console.Write(Math.Log2(dt * sn * amount));

The result is 117.925470... bits, 118 bits since we can't use bit partially结果是117.925470...位, 118位,因为我们不能部分使用位

Edit: Compress and decompress routine:编辑:压缩和解压缩例程:

private static byte[] MyCompress(DateTime date, long serial, decimal amount) {
  BigInteger ms = (long)(date - new DateTime(2000, 1, 1)).TotalMilliseconds;

  BigInteger value = 
    ms * 1_000_000_000_000_000_000L * 1_000_00 +
    (BigInteger)serial * 1_000_00 +
    (BigInteger)(amount * 100);

  byte[] result = new byte[15];

  for (int i = result.Length - 1; i >= 0; --i, value /= 256) 
    result[i] = (byte)(value % 256);

  return result;
}

private static (DateTime date, long serial, decimal amount) MyDecomress(byte[] data) {
  BigInteger value = data.Aggregate(BigInteger.Zero, (s, a) => s * 256 + a);

  BigInteger amount = value % 1_000_00;
  BigInteger serial = (value / 1_000_00) % 1_000_000_000_000_000_000L;
  BigInteger dt = value / 1_000_00 / 1_000_000_000_000_000_000L;

  return (
    new DateTime(2000, 1, 1).AddMilliseconds((double)dt),
    (long)serial,
    (decimal)amount / 100M
  );
}

Demo:演示:

var data = MyCompress(new DateTime(2023, 1, 25, 21, 06, 45), 12345, 345.87m);

Console.WriteLine(string.Join(" ", data.Select(b => b.ToString("X2"))));

var back = MyDecomress(data);

Console.Write(back);

Output: Output:

00 0E 05 4C 23 D7 34 A8 BD E8 F7 CC 3D 95 80 BB
(25.01.2023 21:06:45, 12345, 345.87)

Fiddle小提琴

Edit: If we can store date and time up to 1/10 second (not up to millsecond) we can use 14 bytes only:编辑:如果我们可以将日期和时间存储到1/10秒(而不是毫秒),我们只能使用14个字节:

private static byte[] MyCompress(DateTime date, long serial, decimal amount) {
  BigInteger ms = (long)(date - new DateTime(2000, 1, 1)).TotalMilliseconds / 100;

  BigInteger value = 
    ms * 1_000_000_000_000_000_000L * 1_000_00 +
    (BigInteger)serial * 1_000_00 +
    (BigInteger)(amount * 100);

  byte[] result = new byte[14];

  for (int i = result.Length - 1; i >= 0; --i, value /= 256) 
    result[i] = (byte)(value % 256);

  return result;
}

private static (DateTime date, long serial, decimal amount) MyDecomress(byte[] data) {
  BigInteger value = data.Aggregate(BigInteger.Zero, (s, a) => s * 256 + a);

  BigInteger amount = value % 1_000_00;
  BigInteger serial = (value / 1_000_00) % 1_000_000_000_000_000_000L;
  BigInteger dt = value / 1_000_00 / 1_000_000_000_000_000_000L;

  return (
    new DateTime(2000, 1, 1).AddMilliseconds((double)dt * 100),
    (long)serial,
    (decimal)amount / 100M
  );
}

Solution #1 (old, 16 bytes):解决方案 #1(旧的,16 字节):

You can save two digits (bytes) by using the mentioned restrictions:您可以使用上述限制保存两位数(字节):

  1. Combine month+day into dayOfYear ( 000-365 ) (for consistency assume there are always 29 days in February);month+day组合成dayOfYear ( 000-365 )(为了保持一致性,假设 2 月总是有 29 天);
  2. Combine hours+minutes+seconds into timeInSeconds ( 00000-86399 ).hours+minutes+seconds合并为timeInSeconds ( 00000-86399 )。

Note, that there are may be some other technics you could use to reduce the size of the string.请注意,您可能还可以使用其他一些技术来减小字符串的大小。

After this you can convert the number in the string from base 10 to base 256 .在此之后,您可以将字符串中的数字从base 10转换为base 256 Thus you get 16 bytes instead of 37. No mathematical proof, just practical result in the code by link (output at the bottom of the page).因此你得到16 个字节而不是 37 个字节。没有数学证明,只是通过链接在代码中的实际结果(页面底部的输出)。 https://ideone.com/SMKb6S https://ideone.com/SMKb6S

Results:结果:

initial: 39 999912312359599999999999999999999999999
base10: 37 9999365863999999999999999999999999999
base256: 16 [7, 133, 206, 204, 233, 237, 90, 213, 156, 154, 224, 34, 63, 255, 255, 255]
base62: 21 EC5zRr0FV71hggqe73b0J

And after this you can try some compression methods.在此之后你可以尝试一些压缩方法。 However, as noted in comments, it may not work with small amount of data.但是,如评论中所述,它可能不适用于少量数据。

Solution #2 (15 bytes):解决方案 #2(15 字节):

Actually, you can end up with 15 bytes .实际上,您最终可以得到15 个字节 Dmitry Bychenko in his answer used microseconds instead of milliseconds (I don't have enough reputation to point that out in comment). Dmitry Bychenko 在他的回答中使用微秒而不是毫秒(我没有足够的声誉在评论中指出这一点)。 Fixed.固定的。 So, 128 years will be 4_047_667_200_000 milliseconds (or something like that).因此, 128 years将是4_047_667_200_000 milliseconds (或类似时间)。

All the data fits in 15 bytes, and some bits are even left free.所有数据都在 15 个字节中,有些位甚至是空闲的。 You can use them to increase the maximum amount, for example.例如,您可以使用它们来增加最大数量。 Here are calculations in Python: https://ideone.com/37Bie3下面是Python中的计算: https://ideone.com/37Bie3

Results:结果:

Target bytes: 15 (120 bits)
Years: 64
  Total bits: 120
  Max amount: £41943.04 (22 bits, 5 free bits used)
Years: 128
  Total bits: 120
  Max amount: £20971.52 (21 bits, 4 free bits used)
Years: 256
  Total bits: 120
  Max amount: £10485.76 (20 bits, 3 free bits used)
Years: 512
  Total bits: 120
  Max amount: £5242.88 (19 bits, 2 free bits used)
Years: 1024
  Total bits: 120
  Max amount: £2621.44 (18 bits, 1 free bits used)
Years: 2048
  Total bits: 120
  Max amount: £1310.72 (17 bits, 0 free bits used)

Edit: perform some formatting to the solution #1, add solution #2.编辑:对解决方案 #1 执行一些格式化,添加解决方案 #2。

Instead of trying to compress the text version of the data, consider your data and store it in a more efficient format.与其尝试压缩数据的文本版本,不如考虑您的数据并以更有效的格式存储它。

A date can be stored in seconds since EPOCH time (EDIT) ticks of a DateTime object which should take 8 bytes (unsigned long).日期可以以秒为单位存储,因为 EPOCH 时间(EDIT)滴答 DateTime object 应该占用 8 个字节(无符号长)。

Your device serial number can be stored in an unsigned long as well, and if there are any leading 0s they can be assumed if its always a fixed 17 digits.您的设备序列号也可以存储在无符号长整数中,如果有任何前导 0,则可以假设它们始终是固定的 17 位数字。

Your amount can be stored in an unsigned int in the range 0 to 99999 and assume the last two digits are after a decimal point.您的金额可以存储在 0 到 99999 范围内的无符号整数中,并假设最后两位数字在小数点后。

This gives you a total size of 8 + 8 + 4 = 20 bytes.这为您提供了 8 + 8 + 4 = 20 字节的总大小。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 有没有更有效的比较字符串的方法? - Is there a more efficient way of comparing strings? 当我仅对XPATH的后代有所了解时,在XPATH中获取值? - getting a value in XPATH when only I know something about it descendants? 是否有更有效的方法来协调大型数据集? - Is there a more efficient way to reconcile large data sets? 我知道要分块上传,我们是否必须在接收端做些事情? - I know about uploading in chunks, do we have to do something on receiving end? 当我们事先不知道有多少个哈希集时,最好的方法是在c#中采用两个以上的哈希集的交集 - best way to take an intersection of more than two hashsets in c#, when we donot know before hand how many hashsets are there 制作外部数据结构更新程序UI的有效方法 - The efficient way to make external data structure updater UI 有没有办法使它更有效? - Is there a way to make this more efficient? 一种基于选定选项而不是大型if树的更有效/更少业余的方法? - A more efficient / less-amateur way to do something based on selected option instead of a massive if-tree? 比较整数和整数或字符串和字符串是否更有效 - Is it more efficient to compare ints and ints or strings and strings WPF 应用程序将数据表写入 Excel 的更有效方法? - More efficient way for a WPF application to write a data table to Excel?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM