压缩Java中的字节数组并在C中解压缩

Question

I currently have the following array in a Java program, 我目前在Java程序中有以下数组，

byte[] data = new byte[800];

and I'd like to compress it before sending it to a microcontroller over serial (115200 Baud). 我想在通过串口（115200 Baud）将其发送到微控制器之前对其进行压缩。 I would like to then decompress the array on the microcontroller in C. However, I'm not quite sure what the best way to do this is. 我想在C中用微控制器解压缩数组。但是，我不太确定最好的方法是什么。 Performance is an issue since the microcontroller is just an arduino so it can't be too memory/cpu intensive. 性能是一个问题，因为微控制器只是一个arduino所以它不能太存储器/ CPU密集型。 The data is more or less random ( edit I guess it's not really that random, see the edit below) I'd say since it represents a rgb color value for every 16 bits. 数据或多或少是随机的（编辑我猜它不是那么随机，请看下面的编辑）我会说因为它代表每16位的rgb颜色值。

What would be the best way to compress this data? 压缩这些数据的最佳方法是什么？ Any idea how much compression I could possibly get? 知道我可以得到多少压缩？

edit 编辑

Sorry about the lack of info. 对于缺乏信息感到抱歉。 I need the compression to be lossless and I do only intend to send 800 bytes at a time. 我需要压缩无损，我只打算一次发送800个字节。 My issue is that 800 bytes won't transfer fast enough at the rate of 115200 baud that I am using. 我的问题是800字节不能以我正在使用的115200波特率快速传输。 I was hoping I could shrink the size a little bit to improve speed. 我希望我能缩小一点尺寸以提高速度。

Every two bytes looks like: 每两个字节看起来像：

0RRRRRGGGGGBBBBB 0RRRRRGGGGGBBBBB

Where RG and B bits represent the values for color channels red, green, and blue respectively. 其中RG和B位分别代表红色，绿色和蓝色的颜色通道的值。 Every two bytes is then an individual LED on a 20x20 grid. 然后，每两个字节就是20x20网格上的单个LED。 I would imagine that many sets of two bytes would be identical since I frequently assign the same color codes to multiple LEDs. 我会想象很多两个字节都是相同的，因为我经常为多个LED分配相同的颜色代码。 It may also be the case that RGB values are often > 15 since I typically use bright colors when I can (However, this might be a moot point since they are not all typically > 15 at once). 也可能是这样的情况，RGB值通常> 15，因为我通常使用明亮的颜色（但是，这可能是一个没有实际意义的点，因为它们通常不是一次全部> 15）。

Answer 1

If the data is "more or less random" then you won't have much luck compressing it, I'm afraid. 如果数据“或多或少随机”，那么压缩它就不会有太多运气，我担心。

UPDATE UPDATE

Given the new information, I bet you don't need 32k colours on your LED display. 鉴于新信息，我敢打赌你的LED显示屏上不需要32k色。 I'd imagine that a 1024- or 256-colour palette might be sufficient. 我想象一个1024或256色的调色板可能就足够了。 Hence you could get away with a trivial compression scheme (simply map each word through a lookup table, or possibly just discard lsbs of each component), that would work even for completely uncorrelated pixel values. 因此，你可以通过一个简单的压缩方案（简单地通过查找表映射每个单词，或者可能只丢弃每个组件的lsbs），这甚至可以用于完全不相关的像素值。

Answer 2

Use miniLZO compression. 使用miniLZO压缩。 Java version C version Java版 C版

Answer 3

A really simple compression/decompression algorithm that is practical in tiny embedded environments and is easy to "roll your own" is run length encoding. 一个非常简单的压缩/解压缩算法，在微小的嵌入式环境中很实用，并且很容易“自己滚动”是运行长度编码。 Basically this means replacing a run of duplicate values with a (count, value) pair. 基本上这意味着用（count，value）对替换一系列重复值。 Of course you need a sentinel (magic) value to introduce the pair, and then a mechanism to allow the magic value to appear in normal data (typically an escape sequence can be used for both jobs). 当然，您需要一个sentinel（魔术）值来引入该对，然后需要一种机制来允许魔术值出现在普通数据中（通常一个转义序列可以用于这两个作业）。 In your example it might be best to use 16 bit values (2 bytes). 在您的示例中，最好使用16位值（2个字节）。

But naturally it all depends on the data. 但自然这一切都取决于数据。 Data that is sufficiently random is incompressible by definition. 根据定义，足够随机的数据是不可压缩的。 You would do best to collect some example data first, then evaluate your compression options. 您最好先收集一些示例数据，然后评估压缩选项。

Edit after extra information posted 发布额外信息后编辑

Just for fun and to show how easy run length encoding is I have coded up something. 只是为了好玩，并展示了如何轻松运行长度编码，我编写了一些东西。 I'm afraid I've used C for compression as well, since I'm not a Java guy. 我担心我也使用C进行压缩，因为我不是Java人。 To keep things simple I've worked entirely with 16 bit data. 为了简单起见，我完全使用了16位数据。 An optimization would be to use an 8 bit count in the (count,value) pair. 优化是在（计数，值）对中使用8位计数。 I haven't tried to compile or test this code. 我没有尝试编译或测试此代码。 See also my comment to your question about the possible benefits of mangling the LED addresses. 另请参阅我对您关于修复LED地址可能带来的好处的问题的评论。

#define NBR_16BIT_WORDS 400
typedef unsigned short uint16_t;

// Return number of words written to dst (always
//  less than or equal to NBR_16BIT_WORDS)
uint16_t compress( uint16_t *src, uint16_t *dst )
{
    uint16_t *end = (src+NBR_16BIT_WORDS);
    uint16_t *dst_begin = dst;
    while( src < end )
    {
        uint16_t *temp;
        uint16_t count=1;
        for( temp=src+1; temp<end; temp++ )
        {
            if( *src == *temp )
                count++;
            else
                break;
        }
        if( count < 3 )
            *dst++ = *src++;
        else
        {
            *dst++ = (*src)|0x8000;
            *dst++ = count;
            *src += count;
        }
    }  
    return dst-dst_begin;
}

void decompress( uint16_t *src, uint16_t *dst )
{
    uint16_t *end_src = (src+NBR_16BIT_WORDS);
    uint16_t *end_dst = (dst+NBR_16BIT_WORDS);
    while( src<end_src && dst<end_dst )
    {
        data = *src++;
        if( (data&0x8000) == 0 )
            *dst++ = data;
        else
        {
            data  &= 0x7fff;
            uint16_t count = *src++;
            while( dst<end_dst && count-- )
                *dst++ = data;
        }
    }
}

Answer 4

One of the first things to do would be to convert from RGB to YUV, or YCrCb, or something on that order. 首先要做的事情之一是从RGB转换为YUV，或YCrCb，或者该订单上的某些东西。 Having done that, you can usually get away with sub-sampling the U and V (or Cr/Cb) channels to half resolution. 完成后，您通常可以通过对U和V（或Cr / Cb）通道进行二次采样来获得半分辨率。 This is quite common in most types of images (eg, JPEG, and MPEG both do it, and so do the sensors in most digital cameras). 这在大多数类型的图像中非常常见（例如，JPEG和MPEG都这样做，大多数数码相机中的传感器也是如此）。

Realistically, starting with only 800 bytes of data, most other forms of compression are going to be a waste of time and effort. 实际上，从仅800字节的数据开始，大多数其他形式的压缩将浪费时间和精力。 You're going to have to put in quite a bit of work before you accomplish much (and keeping it reasonably fast on a Arduino won't be trivial for either). 在你完成很多工作之前，你将不得不投入相当多的工作（并且在Arduino上保持合理的速度对于任何一个都不会是微不足道的）。

Edit: okay, if you're absolutely certain you can't modify the data at all, things get more difficult very quickly. 编辑：好的，如果你绝对肯定你根本无法修改数据，那么事情就会变得更加困难。 The real question at that point is what kind of input you're dealing with. 那时真正的问题是你正在处理什么样的输入。 Others have already mentioned the possibility of something on the order of a predictive delta compression -- eg, based on preceding pixels, predict what the next one is likely to be, and then encode only the difference between the prediction and the actual value. 其他人已经提到了预测增量压缩的顺序的可能性 - 例如，基于先前的像素，预测下一个可能是什么，然后仅编码预测和实际值之间的差异。 Getting the most out of that, however, generally requires running the result through some sort of entropy-based algorithm like Shannon-Fanno or Huffman compression. 然而，要充分利用它，通常需要通过某种基于熵的算法运行结果，如Shannon-Fanno或Huffman压缩。 Those, unfortunately, aren't usually the fastest to decompress though. 不幸的是，这些通常不是解压缩最快的。

If your data is most things like charts or graphs, where you can expect to have large areas of identical pixels, run-length (or run-end) encoding can work pretty well. 如果你的数据大多数是图表或图形，你可以期望大面积的相同像素，那么运行长度（或运行结束）编码可以很好地工作。 This does have the advantage of being really trivial to decompress as well. 这确实具有解压缩非常简单的优点。

I doubt that LZ-based compression is going to work so well though. 我怀疑基于LZ的压缩效果会如此好。 LZ-based compression works (in general) by building a dictionary of strings of bytes that have been seen, and when/if the same string of bytes is seen again, transmitting the code assigned to the previous instance instead of re-transmitting the entire string. 基于LZ的压缩（通常）通过构建已经看到的字节串字典来工作，并且当/如果再次看到相同的字节串，则传输分配给先前实例的代码而不是重新传输整个串。 The problem is that you can't transmit uncompressed bytes -- you start out by sending the code word that represents that byte in the dictionary. 问题是您无法传输未压缩的字节 - 您可以通过发送代表字典中该字节的代码字开始。 In your case, you could use (for example) a 10-bit code word. 在您的情况下，您可以使用（例如）一个10位代码字。 This means the first time you send any particularly character, you need to send it as 10 bits, not just 8. You only start to get some compression when you can build up some longer (two-byte, three-byte, etc.) strings in your dictionary, and find a matching string later in the input. 这意味着第一次发送任何特殊字符时，需要将其作为10位发送，而不仅仅是8.只有在可以构建更长时间（两字节，三字节等）时才开始获得压缩。字符串的字典里，后来在输入查找匹配的字符串。

This means LZ-based compression usually gets fairly poor compression for the first couple hundred bytes or so, then about breaks even for a while, and only after it's been running across some input for a while does it really start to compress well. 这意味着基于LZ的压缩通常会在前几百个字节左右得到相当差的压缩，然后甚至会在一段时间内出现断点，并且只有在它运行一段时间之后它才能真正开始压缩。 Dealing with only 800 bytes at a time, I'm not at all sure you're ever going to see much compression -- in fact, working in such small blocks, it wouldn't be particularly surprising to see the data expand on a fairly regular basis (especially if it's very random). 一次仅处理800个字节，我完全不确定你是否会看到很多压缩 - 事实上，在这么小的块中工作，看到数据扩展到一个上面就不会特别令人惊讶相当规律（特别是如果它非常随机）。

Answer 5

The data is more or less random I'd say since it represents a rgb color value for every 16 bits. 我会说，数据或多或少是随机的，因为它表示每16位的rgb颜色值。

What would be the best way to compress this data? 压缩这些数据的最佳方法是什么？ Any idea how much compression I could possibly get? 知道我可以得到多少压缩？

Ideally you can compress 800 bytes of colour data to one byte if the whole image is the same colour. 理想情况下，如果整个图像是相同的颜色，则可以将800字节的颜色数据压缩为一个字节。 As Oli Charlesworth mentions however, the more random the data, the less you can compress it. 然而，正如奥利查尔斯沃思提到的那样，数据越随机，你压缩它就越少。 If your images looks like static on a TV, then indeed, good luck getting any compression out of it. 如果你的图像在电视上看起来像静态，那么确实，祝你好运。

Answer 6

Definitely consider Oli Charlesworth's answer. 绝对考虑Oli Charlesworth的答案。 On a 20x20 grid, I don't know if you need a full 32k color palette. 在20x20网格上，我不知道你是否需要一个完整的32k色调。

Also, in your earlier question , you said you were trying to run this on a 20ms period (50 Hz). 另外，在你之前的问题中，你说你试图在20ms周期（50赫兹）上运行它。 Do you really need that much speed for this display? 你真的需要这么快的速度吗？ At 115200 bps, you can transmit ~11520 bytes/sec - call it 10KBps for a margin of safety (eg your micro might have a delay between bytes, you should do some experiments to see what the 'real' bandwidth is). 在115200 bps，您可以传输~11520字节/秒 - 称之为10KBps以获得安全边际（例如，您的micro可能在字节之间有延迟，您应该做一些实验来查看'真实'带宽是什么）。 At 50 Hz, this only allows you about 200 bytes per packet - you're looking for a compression ratio over 75%, which may not be attainable under any circumstances. 在50 Hz时，这只允许每个数据包大约200个字节 - 您正在寻找超过75％的压缩比，这在任何情况下都可能无法实现。 You seem pretty married to your requirements, but it may be time for an awkward chat. 你似乎很满意你的要求，但也许是时候进行尴尬的聊天了。

If you do want to go the compression route, you will probably just have to try several different algorithms with 'real' data, as others have said, and try different encodings. 如果你确实想要采用压缩路径，你可能只需要尝试几种不同的算法和“真实”数据，就像其他人所说的那样，并尝试不同的编码。 I bet you can find some extra processing time by doing matrix math, etc. in between receiving bytes over the serial link (you'll have about 80 microseconds between bytes) - if you use interrupts to read the serial data instead of polling, you can probably do pretty well by using a double buffer and processing/displaying the previous buffer while reading into the current buffer. 我打赌你可以通过串行链接接收字节之间的矩阵数学等找到一些额外的处理时间（你在字节之间有大约80微秒） - 如果你使用中断来读取串行数据而不是轮询，你通过使用双缓冲区并在读入当前缓冲区时处理/显示前一个缓冲区，可能会做得很好。

EDIT: Is it possible to increase the serial port speed beyond 115200? 编辑：是否可以将串口速度提高到115200以上？ This USB-serial adapter at Amazon says it goes up to 1 Mbps (probably actually 921600 bps). 亚马逊的这款USB串口适配器表示它的速度可达1 Mbps（实际上可能达到921600 bps）。 Depending on your hardware and environment, you may have to worry about bad data, but if you increase the speed enough, you could probably add a checksum, and maybe even limited error correction. 根据您的硬件和环境，您可能不得不担心数据不良，但如果您提高速度，则可能会添加校验和，甚至可能会进行有限的纠错。

I'm not familiar with the Arduino, but I've got an 8-bit FreeScale HCS08 I drive at 1.25 Mbps, although the bus is actually running RS-485, not RS-232 (485 uses differential signaling for better noise performance), and I don't have any problems with noise errors. 我不熟悉Arduino，但我有一个8位FreeScale HCS08 I驱动器，速率为1.25 Mbps，虽然总线实际上运行RS-485，而不是RS-232（485使用差分信号以获得更好的噪声性能），我对噪音错误没有任何问题。 You might even consider a USB RS-485 adapter, if you can wire that to your Arduino (you'd need conversion hardware to change the 485 signals to the Arduino's levels). 您甚至可以考虑USB RS-485适配器，如果您可以将其连接到Arduino（您需要转换硬件将485信号更改为Arduino级别）。

EDIT 2: You might also consider this USB-SPI/I2C adapter , if you have an available I2C or SPI interface, and you can handle the wiring. 编辑2：如果您有可用的I2C或SPI接口，您可能还会考虑这个USB-SPI / I2C适配器，并且您可以处理接线。 It says it can go to 400 kHz I2C or 200 kHz SPI, which is still not quite enough by itself, but you could split the data between the SPI/I2C and the serial link you already have. 它说它可以转到400 kHz I2C或200 kHz SPI，这本身仍然不够，但你可以在SPI / I2C和你已经拥有的串行链路之间分割数据。

Answer 7

LZ77/78 are relatively easy to write http://en.wikipedia.org/wiki/LZ77_and_LZ78 LZ77 / 78相对容易编写http://en.wikipedia.org/wiki/LZ77_and_LZ78

However given the small amount of data you're transferring, its probably not worth compressing it at all. 但是，考虑到您传输的数据量很少，可能根本不值得压缩它。

压缩Java中的字节数组并在C中解压缩

问题描述

7 个解决方案

解决方案1
6 已采纳 2010-11-11 21:47:09

解决方案2
2 2010-11-11 21:40:37

解决方案3
2 2010-11-11 21:43:13

解决方案4
2 2010-11-11 21:47:47

解决方案5
1 2010-11-11 21:51:07

解决方案6
1 2010-11-12 05:58:03

解决方案7
0 2010-11-11 21:44:12

压缩Java中的字节数组并在C中解压缩

问题描述

7 个解决方案

解决方案1 6 已采纳 2010-11-11 21:47:09

解决方案2 2 2010-11-11 21:40:37

解决方案3 2 2010-11-11 21:43:13

解决方案4 2 2010-11-11 21:47:47

解决方案5 1 2010-11-11 21:51:07

解决方案6 1 2010-11-12 05:58:03

解决方案7 0 2010-11-11 21:44:12

解决方案1
6 已采纳 2010-11-11 21:47:09

解决方案2
2 2010-11-11 21:40:37

解决方案3
2 2010-11-11 21:43:13

解决方案4
2 2010-11-11 21:47:47

解决方案5
1 2010-11-11 21:51:07

解决方案6
1 2010-11-12 05:58:03

解决方案7
0 2010-11-11 21:44:12