簡體   English   中英

C ++優化整數數組

[英]c++ optimize array of ints

我有一個int16_t的2D查找表。

int16_t my_array[37][73] = {{**DATA HERE**}}

我混合使用的值范圍從剛好高於int8_t的范圍到剛好低於int8_t的范圍,並且其中一些值會重復。 我正在嘗試減小此查找表的大小。

到目前為止,我所做的是將每個int16_t值分成兩個int8_t值以可視化浪費的字節。

int8_t part_1 = original_value >> 4;
int8_t part_2 = original_value & 0x0000FFFF;

// If the upper 4 bits of the original_value were empty         
if(part_1 == 0) wasted_bytes_count++;

我可以輕松刪除浪費一個字節空間的零值int8_t,也可以刪除重復的值,但是我的問題是如何在保留基於兩個索引的能力的同時刪除這些值?

我打算將其轉換為一維數組,並在每個重復的值之后添加一個數字,該數字將表示已刪除的重復項的數量,但是我在如何確定什么是查找值和什么是重復計數方面感到困惑。 而且,通過去除浪費字節的零個int8_t值,將使情況更加復雜。

編輯:此數組已存儲在ROM中。 RAM甚至比ROM更受限制,因此它已經存儲在ROM中。

編輯:我將盡快為這個問題發布賞金。 我需要有關如何存儲和檢索信息的完整答案。 只要我可以得到相同的值,就不必是2D數組。

編輯:添加下面的實際數組:

{150,145,140,135,130,125,120,115,110,105,100,95,90,85,80,75,70,65,60,55,50,45,40,35,30,25,20,15,10,5,0,-4,-9,-14,-19,-24,-29,-34,-39,-44,-49,-54,-59,-64,-69,-74,-79,-84,-89,-94,-99,104,109,114,119,124,129,134,139,144,149,154,159,164,169,174,179,175,170,165,160,155,150}, \
{143,137,131,126,120,115,110,105,100,95,90,85,80,75,71,66,62,57,53,48,44,39,35,31,27,22,18,14,9,5,1,-3,-7,-11,-16,-20,-25,-29,-34,-38,-43,-47,-52,-57,-61,-66,-71,-76,-81,-86,-91,-96,101,107,112,117,123,128,134,140,146,151,157,163,169,175,178,172,166,160,154,148,143}, \
{130,124,118,112,107,101,96,92,87,82,78,74,70,65,61,57,54,50,46,42,38,34,31,27,23,19,16,12,8,4,1,-2,-6,-10,-14,-18,-22,-26,-30,-34,-38,-43,-47,-51,-56,-61,-65,-70,-75,-79,-84,-89,-94,100,105,111,116,122,128,135,141,148,155,162,170,177,174,166,159,151,144,137,130}, \
{111,104,99,94,89,85,81,77,73,70,66,63,60,56,53,50,46,43,40,36,33,30,26,23,20,16,13,10,6,3,0,-3,-6,-9,-13,-16,-20,-24,-28,-32,-36,-40,-44,-48,-52,-57,-61,-65,-70,-74,-79,-84,-88,-93,-98,103,109,115,121,128,135,143,152,162,172,176,165,154,144,134,125,118,111}, \
{85,81,77,74,71,68,65,63,60,58,56,53,51,49,46,43,41,38,35,32,29,26,23,19,16,13,10,7,4,1,-1,-3,-6,-9,-13,-16,-19,-23,-26,-30,-34,-38,-42,-46,-50,-54,-58,-62,-66,-70,-74,-78,-83,-87,-91,-95,100,105,110,117,124,133,144,159,178,160,141,125,112,103,96,90,85}, \
{62,60,58,57,55,54,52,51,50,48,47,46,44,42,41,39,36,34,31,28,25,22,19,16,13,10,7,4,2,0,-3,-5,-8,-10,-13,-16,-19,-22,-26,-29,-33,-37,-41,-45,-49,-53,-56,-60,-64,-67,-70,-74,-77,-80,-83,-86,-89,-91,-94,-97,101,105,111,130,109,84,77,74,71,68,66,64,62}, \
{46,46,45,44,44,43,42,42,41,41,40,39,38,37,36,35,33,31,28,26,23,20,16,13,10,7,4,1,-1,-3,-5,-7,-9,-12,-14,-16,-19,-22,-26,-29,-33,-36,-40,-44,-48,-51,-55,-58,-61,-64,-66,-68,-71,-72,-74,-74,-75,-74,-72,-68,-61,-48,-25,2,22,33,40,43,45,46,47,46,46}, \
{36,36,36,36,36,35,35,35,35,34,34,34,34,33,32,31,30,28,26,23,20,17,14,10,6,3,0,-2,-4,-7,-9,-10,-12,-14,-15,-17,-20,-23,-26,-29,-32,-36,-40,-43,-47,-50,-53,-56,-58,-60,-62,-63,-64,-64,-63,-62,-59,-55,-49,-41,-30,-17,-4,6,15,22,27,31,33,34,35,36,36}, \
{30,30,30,30,30,30,30,29,29,29,29,29,29,29,29,28,27,26,24,21,18,15,11,7,3,0,-3,-6,-9,-11,-12,-14,-15,-16,-17,-19,-21,-23,-26,-29,-32,-35,-39,-42,-45,-48,-51,-53,-55,-56,-57,-57,-56,-55,-53,-49,-44,-38,-31,-23,-14,-6,0,7,13,17,21,24,26,27,29,29,30}, \
{25,25,26,26,26,25,25,25,25,25,25,25,25,26,25,25,24,23,21,19,16,12,8,4,0,-3,-7,-10,-13,-15,-16,-17,-18,-19,-20,-21,-22,-23,-25,-28,-31,-34,-37,-40,-43,-46,-48,-49,-50,-51,-51,-50,-48,-45,-42,-37,-32,-26,-19,-13,-7,-1,3,7,11,14,17,19,21,23,24,25,25}, \
{21,22,22,22,22,22,22,22,22,22,22,22,22,22,22,22,21,20,18,16,13,9,5,1,-3,-7,-11,-14,-17,-18,-20,-21,-21,-22,-22,-22,-23,-23,-25,-27,-29,-32,-35,-37,-40,-42,-44,-45,-45,-45,-44,-42,-40,-36,-32,-27,-22,-17,-12,-7,-3,0,3,7,9,12,14,16,18,19,20,21,21}, \
{18,19,19,19,19,19,19,19,19,19,19,19,19,19,19,19,18,17,16,14,10,7,2,-1,-6,-10,-14,-17,-19,-21,-22,-23,-24,-24,-24,-24,-23,-23,-23,-24,-26,-28,-30,-33,-35,-37,-38,-39,-39,-38,-36,-34,-31,-28,-24,-19,-15,-10,-6,-3,0,1,4,6,8,10,12,14,15,16,17,18,18}, \
{16,16,17,17,17,17,17,17,17,17,17,16,16,16,16,16,16,15,13,11,8,4,0,-4,-9,-13,-16,-19,-21,-23,-24,-25,-25,-25,-25,-24,-23,-21,-20,-20,-21,-22,-24,-26,-28,-30,-31,-32,-31,-30,-29,-27,-24,-21,-17,-13,-9,-6,-3,-1,0,2,4,5,7,9,10,12,13,14,15,16,16}, \
{14,14,14,15,15,15,15,15,15,15,14,14,14,14,14,14,13,12,11,9,5,2,-2,-6,-11,-15,-18,-21,-23,-24,-25,-25,-25,-25,-24,-22,-21,-18,-16,-15,-15,-15,-17,-19,-21,-22,-24,-24,-24,-23,-22,-20,-18,-15,-12,-9,-5,-3,-1,0,1,2,4,5,6,8,9,10,11,12,13,14,14}, \
{12,13,13,13,13,13,13,13,13,13,13,13,12,12,12,12,11,10,9,6,3,0,-4,-8,-12,-16,-19,-21,-23,-24,-24,-24,-24,-23,-22,-20,-17,-15,-12,-10,-9,-9,-10,-12,-13,-15,-17,-17,-18,-17,-16,-15,-13,-11,-8,-5,-3,-1,0,1,1,2,3,4,6,7,8,9,10,11,12,12,12}, \
{11,11,11,11,11,12,12,12,12,12,11,11,11,11,11,10,10,9,7,5,2,-1,-5,-9,-13,-17,-20,-22,-23,-23,-23,-23,-22,-20,-18,-16,-14,-11,-9,-6,-5,-4,-5,-6,-8,-9,-11,-12,-12,-12,-12,-11,-9,-8,-6,-3,-1,0,0,1,1,2,3,4,5,6,7,8,9,10,11,11,11}, \
{10,10,10,10,10,10,10,10,10,10,10,10,10,10,9,9,9,7,6,3,0,-3,-6,-10,-14,-17,-20,-21,-22,-22,-22,-21,-19,-17,-15,-13,-10,-8,-6,-4,-2,-2,-2,-2,-4,-5,-7,-8,-8,-9,-8,-8,-7,-5,-4,-2,0,0,1,1,1,2,2,3,4,5,6,7,8,9,10,10,10}, \
{9,9,9,9,9,9,9,10,10,9,9,9,9,9,9,8,8,6,5,2,0,-4,-7,-11,-15,-17,-19,-21,-21,-21,-20,-18,-16,-14,-12,-10,-8,-6,-4,-2,-1,0,0,0,-1,-2,-4,-5,-5,-6,-6,-5,-5,-4,-3,-1,0,0,1,1,1,1,2,3,3,5,6,7,8,8,9,9,9}, \
{9,9,9,9,9,9,9,9,9,9,9,9,8,8,8,8,7,5,4,1,-1,-5,-8,-12,-15,-17,-19,-20,-20,-19,-18,-16,-14,-11,-9,-7,-5,-4,-2,-1,0,0,1,1,0,0,-2,-3,-3,-4,-4,-4,-3,-3,-2,-1,0,0,0,0,0,1,1,2,3,4,5,6,7,8,8,9,9}, \
{9,9,9,8,8,8,9,9,9,9,9,8,8,8,8,7,6,5,3,0,-2,-5,-9,-12,-15,-17,-18,-19,-19,-18,-16,-14,-12,-9,-7,-5,-4,-2,-1,0,0,1,1,1,1,0,0,-1,-2,-2,-3,-3,-2,-2,-1,-1,0,0,0,0,0,0,0,1,2,3,4,5,6,7,8,8,9}, \
{8,8,8,8,8,8,9,9,9,9,9,9,8,8,8,7,6,4,2,0,-3,-6,-9,-12,-15,-17,-18,-18,-17,-16,-14,-12,-10,-8,-6,-4,-2,-1,0,0,1,2,2,2,2,1,0,0,-1,-1,-1,-2,-2,-1,-1,0,0,0,0,0,0,0,0,0,1,2,3,4,5,6,7,8,8}, \
{8,8,8,8,9,9,9,9,9,9,9,9,9,8,8,7,5,3,1,-1,-4,-7,-10,-13,-15,-16,-17,-17,-16,-15,-13,-11,-9,-6,-5,-3,-2,0,0,0,1,2,2,2,2,1,1,0,0,0,-1,-1,-1,-1,-1,0,0,0,0,-1,-1,-1,-1,-1,0,0,1,3,4,5,7,7,8}, \
{8,8,9,9,9,9,10,10,10,10,10,10,10,9,8,7,5,3,0,-2,-5,-8,-11,-13,-15,-16,-16,-16,-15,-13,-12,-10,-8,-6,-4,-2,-1,0,0,1,2,2,3,3,2,2,1,0,0,0,0,0,0,0,0,0,0,-1,-1,-2,-2,-2,-2,-2,-1,0,0,1,3,4,6,7,8}, \
{7,8,9,9,9,10,10,11,11,11,11,11,10,10,9,7,5,3,0,-2,-6,-9,-11,-13,-15,-16,-16,-15,-14,-13,-11,-9,-7,-5,-3,-2,0,0,1,1,2,3,3,3,3,2,2,1,1,0,0,0,0,0,0,0,-1,-1,-2,-3,-3,-4,-4,-4,-3,-2,-1,0,1,3,5,6,7}, \
{6,8,9,9,10,11,11,12,12,12,12,12,11,11,9,7,5,2,0,-3,-7,-10,-12,-14,-15,-16,-15,-15,-13,-12,-10,-8,-7,-5,-3,-1,0,0,1,2,2,3,3,4,3,3,3,2,2,1,1,1,0,0,0,0,-1,-2,-3,-4,-4,-5,-5,-5,-5,-4,-2,-1,0,2,3,5,6}, \
{6,7,8,10,11,12,12,13,13,14,14,13,13,11,10,8,5,2,0,-4,-8,-11,-13,-15,-16,-16,-16,-15,-13,-12,-10,-8,-6,-5,-3,-1,0,0,1,2,3,3,4,4,4,4,4,3,3,3,2,2,1,1,0,0,-1,-2,-3,-5,-6,-7,-7,-7,-6,-5,-4,-3,-1,0,2,4,6}, \
{5,7,8,10,11,12,13,14,15,15,15,14,14,12,11,8,5,2,-1,-5,-9,-12,-14,-16,-17,-17,-16,-15,-14,-12,-11,-9,-7,-5,-3,-1,0,0,1,2,3,4,4,5,5,5,5,5,5,4,4,3,3,2,1,0,-1,-2,-4,-6,-7,-8,-8,-8,-8,-7,-6,-4,-2,0,1,3,5}, \
{4,6,8,10,12,13,14,15,16,16,16,16,15,13,11,9,5,2,-2,-6,-10,-13,-16,-17,-18,-18,-17,-16,-15,-13,-11,-9,-7,-5,-4,-2,0,0,1,3,3,4,5,6,6,7,7,7,7,7,6,5,4,3,2,0,-1,-3,-5,-7,-8,-9,-10,-10,-10,-9,-7,-5,-4,-1,0,2,4}, \
{4,6,8,10,12,14,15,16,17,18,18,17,16,15,12,9,5,1,-3,-8,-12,-15,-18,-19,-20,-20,-19,-18,-16,-15,-13,-11,-8,-6,-4,-2,-1,0,1,3,4,5,6,7,8,9,9,9,9,9,9,8,7,5,3,1,-1,-3,-6,-8,-10,-11,-12,-12,-11,-10,-9,-7,-5,-2,0,1,4}, \
{4,6,8,11,13,15,16,18,19,19,19,19,18,16,13,10,5,0,-5,-10,-15,-18,-21,-22,-23,-22,-22,-20,-18,-17,-14,-12,-10,-8,-5,-3,-1,0,1,3,5,6,8,9,10,11,12,12,13,12,12,11,9,7,5,2,0,-3,-6,-9,-11,-12,-13,-13,-12,-11,-10,-8,-6,-3,-1,1,4}, \
{3,6,9,11,14,16,17,19,20,21,21,21,19,17,14,10,4,-1,-8,-14,-19,-22,-25,-26,-26,-26,-25,-23,-21,-19,-17,-14,-12,-9,-7,-4,-2,0,1,3,5,7,9,11,13,14,15,16,16,16,16,15,13,10,7,4,0,-3,-7,-10,-12,-14,-15,-14,-14,-12,-11,-9,-6,-4,-1,1,3}, \
{4,6,9,12,14,17,19,21,22,23,23,23,21,19,15,9,2,-5,-13,-20,-25,-28,-30,-31,-31,-30,-29,-27,-25,-22,-20,-17,-14,-11,-9,-6,-3,0,1,4,6,9,11,13,15,17,19,20,21,21,21,20,18,15,11,6,2,-2,-7,-11,-13,-15,-16,-16,-15,-13,-11,-9,-7,-4,-1,1,4}, \
{4,7,10,13,15,18,20,22,24,25,25,25,23,20,15,7,-2,-12,-22,-29,-34,-37,-38,-38,-37,-36,-34,-31,-29,-26,-23,-20,-17,-13,-10,-7,-4,-1,2,5,8,11,13,16,18,21,23,24,26,26,26,26,24,21,17,12,5,0,-6,-10,-14,-16,-16,-16,-15,-14,-12,-10,-7,-4,-1,1,4}, \
{4,7,10,13,16,19,22,24,26,27,27,26,24,19,11,-1,-15,-28,-37,-43,-46,-47,-47,-45,-44,-41,-39,-36,-32,-29,-26,-22,-19,-15,-11,-8,-4,-1,2,5,9,12,15,19,22,24,27,29,31,33,33,33,32,30,26,21,14,6,0,-6,-11,-14,-15,-16,-15,-14,-12,-9,-7,-4,-1,1,4}, \
{6,9,12,15,18,21,23,25,27,28,27,24,17,4,-14,-34,-49,-56,-60,-60,-60,-58,-56,-53,-50,-47,-43,-40,-36,-32,-28,-25,-21,-17,-13,-9,-5,-1,2,6,10,14,17,21,24,28,31,34,37,39,41,42,43,43,41,38,33,25,17,8,0,-4,-8,-10,-10,-10,-8,-7,-4,-2,0,3,6}, \
{22,24,26,28,30,32,33,31,23,-18,-81,-96,-99,-98,-95,-93,-89,-86,-82,-78,-74,-70,-66,-62,-57,-53,-49,-44,-40,-36,-32,-27,-23,-19,-14,-10,-6,-1,2,6,10,15,19,23,27,31,35,38,42,45,49,52,55,57,60,61,63,63,62,61,57,53,47,40,33,28,23,21,19,19,19,20,22}, \
{168,173,178,176,171,166,161,156,151,146,141,136,131,126,121,116,111,106,101,-96,-91,-86,-81,-76,-71,-66,-61,-56,-51,-46,-41,-36,-31,-26,-21,-16,-11,-6,-1,3,8,13,18,23,28,33,38,43,48,53,58,63,68,73,78,83,88,93,98,103,108,113,118,123,128,133,138,143,148,153,158,163,168}, \

謝謝你的時間。

我看到了一些用於數組壓縮的選項。

1.分開的8位和1位數組

您可以將數組分為兩部分:第一個存儲原始數組的8個低位,如果值不適合8個位,第二個存儲“ 1”,否則存儲“ 0”。 每個值將占用9位(與Nightcracker方法中的空間相同,但更簡單)。 要從這兩個數組讀取值,請執行以下操作:

int8_t array8[37*73] = {...};
uint16_t array1[(37*73+15)/16] = {...};
size_t offset = 37 * x + y;
int16_t item = static_cast<int16_t>(array8[offset]); // sign extend
int16_t overflow = ((array1[offset/16] >> (offset%16)) & 0x0001) << 7;
item ^= overflow;

2.近似值

如果可以使用一些有效計算的函數(例如多項式或指數)來近似數組,則可以僅將值和近似值之間的差存儲在數組中。 每個值可能只需要8位,甚至更少。

3.增量編碼

如果您的數據足夠平滑,則除了可以使用上述兩種方法之一之外,您還可以存儲一個較短的表,其中僅包含部分數據值和其他表,僅包含所有值之間的差異(第一個表中不存在)以及第一張桌子。 每個值需要較少的位。

例如,您可以存儲第五個值和其他值的差異:

  Original array: 0 0 1 1 2 2 2 2 2 3 3 3 4 4 5 5 5 5 5 6 6 6 6 6 6 6 6 7 7 7
     Short array: 0         2         3         5         6         6
Difference array:   0 1 1 2   0 0 0 1   0 1 1 2   0 0 0 1   0 0 0 0   0 1 1 1

另外,您可以使用與先前值的差異,后者需要每個值更少的位:

  Original array: 0 0 1 1 2 2 2 2 2 3 3 3 4 4 5 5 5 5 5 6 6 6 6 6 6 6 6 7 7 7
     Short array: 0         2         3         5         6         6
     Delta array:   0 1 0 1   0 0 0 1   0 1 0 1   0 0 0 1   0 0 0 0   0 1 0 0

如果一組增量值恰好適合int16_t,則可以使用按位運算有效地實現增量數組的方法。


初始化

對於選項2,可以使用預處理器。 對於其他選項,可以使用預處理器,但可能不太方便(預處理器對於處理長值列表不是很好)。 預處理程序模板和可變參數模板的某種組合可能更好。 或者使用某些文本處理腳本可能會更容易。


更新資料

查看實際數據后,我可以說出更多細節。 選項#2(近似值)對於您的數據不是很方便。 選項1似乎更好。 或者,您可以使用Mark Ransom或Nightcracker的方法。 哪一個都沒關系-在所有情況下,您都將16位保存為7位。

選項#3(增量編碼)可以節省更多空間。 它不能直接使用,因為在陣列的某些單元中,數據會突然變化。 但是,據我所知,這些大變化每行最多發生一次。 這可以通過增量數組中具有完整數據值和一個特殊值的另一列來實現。

我注意到,相鄰值之間的差異(忽略這些突然的變化)永遠不會超過+/-32。這需要6位才能對每個增量值進行編碼。 這意味着每個值6.6位。 58%壓縮。 大約2400字節。 (不多,但比您的評論中的2464K好一點)。

數組的中間部分更加平滑。 每個值只需要5位即可分別編碼。 這樣可以節省300..400字節。 最好將此數組拆分成幾個部分,然后對每個部分進行不同的編碼。

正如Nightcracker指出的那樣,您的值將適合9位。 不過,有一種更簡單的方法來存儲這些值。 將絕對值放入字節數組,然后將符號位放入單獨的打包位數組。

int8_t my_array[37][73] = {{**DATA ABSOLUTE VALUES HERE**}};
int8_t my_signs[37][10] = {{**SIGN BITS HERE**}};
int16_t my_value = my_array[i][j];
if (my_signs[i][j/8] & (1 << j%8))
    my_value = -my_value;

這無需花費太多精力即可將原始表格大小減少44%。

我從經驗中知道,可視化事物可以幫助找到問題的良好解決方案。 由於還不清楚您的數據實際上代表什么(因此我們對問題域一無所知),我們可能不會想出“最好的”解決方案(如果有的話)。 因此我放任自流, 將數據可視化 ; 俗話說:一張圖片值得一千個字:-)

抱歉,我沒有一個解決方案(還)優於已公布的那些,但我認為的情節可能會幫助別人(或自己)想出了一個更好的解決方案。

在此處輸入圖片說明

您想要范圍+ -179。 這意味着您將擁有360個值。 可以用9位表示360個唯一值。 這是一個9位整數查找表的示例:

// size is ceil(37 * 73 * 9 / 16)
uint16_t my_array[1520];

int16_t get_lookup_item(int x, int y) {
    // calculate bitoffset
    size_t bitoffset = (37 * x + y) * 9;

    // calculate difference with 16 bit array offset
    size_t diff = bitoffset % 16;

    uint16_t item;

    // our item doesn't overlap a 16 bit boundary
    if (diff < (16 - 9)) {
        item = my_array[bitoffset / 16]; // get item
        item >>= diff;
        item &= (1 << 9) - 1;

    // our item does overlap a 16 bit boundary
    } else {
        item = my_array[bitoffset / 16];
        item >>= diff;
        item &= (1 << (16 - diff)) - 1;
        item += my_array[bitoffset / 16 + 1] & ((1 << (9 - 16 + diff)) - 1);
    }

    // we now have the unsigned item, substract 179 to bring in the correct range
    return item - 179;
}

這是另一種方法,與我的第一種方法完全不同,這就是為什么它是一個單獨的答案。

如果不能容納8位的值的數量少於總數的1/8,則可以為每個字節分配一個額外的字節,與保留另一個1位數組相比,結果仍然較小。

為了簡化和提高速度,我想保留完整的字節值,而不是位打包。 您從未說過是否有速度限制此問題,但是解碼整個文件只是為了查找一個值似乎是浪費的。 如果這確實不是您的問題,則最好的結果可能是實現一些易於使用的開源壓縮實用程序的解碼部分。

對於此實現,我保持了非常簡單的編碼。 首先,我按照Evgeny Kluev的建議做了一個增量,從每一行開始。 您的數據通常不適合這種方法。 然后,通過以下規則對每個字節進行編碼:

  • 絕對值> = 97的前導字節為97。嘗試不同的閾值並選擇產生最小結果的閾值即可得出該值。 其次是值減去97。
  • 僅檢查游程長度的-96到96之間的值。3到32之間的游程長度編碼為98到127,33到64之間的游程長度編碼為-97到-128。
  • 最后,將按原樣輸出-96到96之間的值。

這將產生一個編碼的2014字節數組,外加另一個36字節的字節,用於索引到每行的開頭,總共2050字節。

完整的實現可以在http://ideone.com/SNdRI上找到。 輸出與問題中發布的表相同。

正如其他人所建議的那樣,通過將每個條目的絕對值存儲在8位整數數組中,並將符號位存儲在單獨的打包位數組中,可以節省大量空間。 Mark Ransom的解決方案簡單易用,性能良好,將大小從5,402字節減少到3,071字節,節省了43.1%。

如果您真的想壓縮每一個最后的空間,則可以通過利用此數據集的特性來做得更好。 特別要注意的是,這些值大多為正值,並且有多個具有相同符號的值。 除了跟蹤“ my_signs”數組中每個值的符號外,您還可以跟蹤負值的游程作為起始索引(兩個字節,范圍為[0..2701])和游程長度(一個字節,因為最長的運行時間是36個條目)。 對於此數據集,標志表的大小從370字節減少到168字節。 這樣,總存儲量為2869字節,與原始存儲量(少2533字節)相比節省了46.8%。

以下是實現此策略的代碼:

uint8_t my_array[37][73] = {{ /* ABSOLUTE VALUES OF ORIGINAL ARRAY HERE */ }};

// Sign bits for the values in my_array.  The data is arranged in groups of
// three bytes.  The first two give the starting index of a run of negative
// values.  The third gives the length of the run.  To determine if a given
// value should be negated, compute it's index as (row * 73) + col, then scan this
// table to see if that index appears in any of the runs.  If it does, the value
// should be negated.

uint8_t my_signs[168]    = {
    0x00, 0x1f, 0x14, 0x00, 0x68, 0x15, 0x00, 0xb1, 0x16, 0x00, 0xfa, 0x18, 
    0x01, 0x42, 0x1a, 0x01, 0x8b, 0x1e, 0x01, 0xd2, 0x23, 0x02, 0x1a, 0x24, 
    0x02, 0x62, 0x24, 0x02, 0xaa, 0x25, 0x02, 0xf2, 0x25, 0x03, 0x3a, 0x25, 
    0x03, 0x83, 0x25, 0x03, 0xcb, 0x25, 0x04, 0x14, 0x24, 0x04, 0x5c, 0x24, 
    0x04, 0xa5, 0x23, 0x04, 0xee, 0x14, 0x05, 0x05, 0x0c, 0x05, 0x36, 0x14, 
    0x05, 0x50, 0x0a, 0x05, 0x7f, 0x13, 0x05, 0x9a, 0x09, 0x05, 0xc8, 0x12, 
    0x05, 0xe4, 0x07, 0x06, 0x10, 0x12, 0x06, 0x2f, 0x05, 0x06, 0x38, 0x05, 
    0x06, 0x59, 0x12, 0x06, 0x7f, 0x08, 0x06, 0xa2, 0x11, 0x06, 0xc7, 0x0b, 
    0x06, 0xeb, 0x11, 0x07, 0x10, 0x0c, 0x07, 0x34, 0x11, 0x07, 0x59, 0x0d, 
    0x07, 0x7c, 0x12, 0x07, 0xa2, 0x0d, 0x07, 0xc5, 0x12, 0x07, 0xeb, 0x0e, 
    0x08, 0x0e, 0x13, 0x08, 0x34, 0x0e, 0x08, 0x57, 0x13, 0x08, 0x7e, 0x0e, 
    0x08, 0x9f, 0x14, 0x08, 0xc7, 0x0e, 0x08, 0xe8, 0x14, 0x09, 0x10, 0x0e, 
    0x09, 0x30, 0x16, 0x09, 0x5a, 0x0d, 0x09, 0x78, 0x17, 0x09, 0xa4, 0x0c, 
    0x09, 0xc0, 0x18, 0x09, 0xef, 0x09, 0x0a, 0x04, 0x1d, 0x0a, 0x57, 0x14
};

int getSign(int row, int col)
{
    int want = (row * 73) + col;
    for (int i = 0 ; i < 168 ; i += 3) {
        int16_t start = (my_signs[i] << 8) | my_signs[i + 1];
        if (start > want) {
            // Not going to find it, so may as well stop now.

            break;
        }

        int runlength = my_signs[i + 2];
        if (want < start + runlength) {
            // Found this index in the signs array, so this entry is negative.

            return -1;
        }
    }
    return 1;
}

int16_t getValue(int row, int col)
{
    return getSign(row, col) * my_values[row][col];
}

實際上,您甚至可以通過識別符號表的游程編碼版本,實際上只需要12位的起始索引和6位的索引,從而以更復雜的代碼為代價,甚至做得更好。游程長度,總共18位(與上述簡單實現所使用的24位相比)。 這樣可以將大小再減少42個字節,總計為2827個字節,比原始大小節省了47.6%(減少了2575個字節)。

調查實際數組表明,數據非常平滑,並且可能會大量壓縮。 在將9位的16位值編碼后,簡單的方法不會減少太多空間。 這是因為陣列中不同位置的數據特性不同。 將數組拆分為幾部分並進行不同的編碼可能會進一步減小數組的大小,但這會更加復雜並增加代碼大小。

這里描述的方法允許對可變長度的數據塊進行編碼,從而可以相對較快地訪問原始值(但比簡單方法要慢得多)。 對於速度的價格,壓縮比顯着增加。

主要思想是增量編碼。 但是與我之前的文章中的簡單算法相比,可變塊長度和可變位深度是可能的。 例如,這允許將零位深度用於重復值的增量。 這意味着僅固定標頭,根本沒有增量值(類似於游程長度編碼)。

此外,該塊中的所有增量都有一個基值。 這允許僅使用基值對線性變化的數據進行編碼(這在實際數組中是很常見的),而對增量值又要花費零空間。 並在其他情況下稍微降低了平均位深度。

壓縮數據存儲在位流數組中,由位流讀取器訪問。 為了快速訪問每個比特流的開始,使用了索引表(僅包含37個16位索引的數組)。

每個比特流都以流中的塊數(5位)開始,然后是塊索引,最后是數據塊。 塊索引提供了一種在搜索過程中跳過不需要的數據塊的方法。 索引包含:塊中的元素數量(4位允許編碼9至24個增量值以及起始值),所有增量的基本值大小(4或6大小為1位)以及大小增量(大小為0..3-如果基本大小為4,則為2位;如果大小為2..5-如果基本大小為6,則為2位)。 這些特定的位深度可能接近最佳值,但可以更改以交換某些空間的速度或使算法適應不同的數據陣列。

數據塊包含起始值(9位),增量的基本值(4或6位)和增量值(每個值為0..3或2..5位)。

這是從壓縮數據中提取原始值的函數:

int get(size_t row, unsigned col)
{
  BitstreamReader bsr(indexTable[row]);
  unsigned blocks = bsr.getUI(5);

  unsigned block = 0;
  unsigned start = 0;
  unsigned nextStart = 0;
  unsigned offset = 0;
  unsigned nextOffset = 0;
  unsigned blockSize = 0;
  unsigned baseSize = 0;
  unsigned deltaSize = 0;
  while (col >= nextStart) // 3 iterations on average
  {
    start = nextStart;
    offset = nextOffset;
    ++block;
    blockSize = bsr.getUI(4) + 9;
    nextStart += blockSize;
    baseSize = bsr.getUI(1)*2 + 4;
    deltaSize = bsr.getUI(2) + baseSize - 4;
    nextOffset += deltaSize * blockSize + baseSize + 9;
  }
  -- block;

  bsr.skip((blocks - block) * 7 + offset);
  int value = bsr.getI(9);
  int base = bsr.getI(baseSize);

  while(col-- > start) // 12 iterations on average
  {
    int delta = base + bsr.getUI(deltaSize);
    value += delta;
  }

  return value;
}

這是位流讀取器的實現:

  class BitstreamReader
  {
  public:
    BitstreamReader(size_t start): word_(start), bit_(0) {}

    void skip(unsigned offset)
    {
      word_ += offset / 16 + ((bit_ + offset >= 16)? 1: 0);
      bit_ = (bit_ + offset) % 16;
    }

    unsigned getUI(unsigned size)
    {
      unsigned old = bit_;
      unsigned result = dataTable[word_] >> bit_;
      result &= ((1 << size) - 1);
      bit_ += size;

      if (bit_ >= 16)
      {
        ++word_;
        bit_ -= 16;

        if (bit_ > 0)
        {
          result += (dataTable[word_] & ((1 << bit_) - 1)) << (16 - old);
        }
      }

      return result;
    }

    int getI(unsigned size)
    {
      int result = static_cast<int>(getUI(size));
      return result | -(result & (1 << (size - 1)));
    }

  private:
    size_t word_;
    unsigned bit_;
  };

我對所得數據大小進行了一些估算。 (由於代碼質量很低,我沒有發布允許我這樣做的代碼)。 結果是1250字節。 這比最佳壓縮程序所能做的大。 但是比任何簡單方法都要低得多。


更新資料

1250字節不是限制。 可以改進該算法以更硬地壓縮數據並更快地工作。

我注意到,塊的數量(5位)可能從行索引表的位流移到未使用的位。 這樣可以節省大約30個字節。

為了節省20個字節,您可以將字節流而不是uint16儲存在字節中,這樣可以節省填充位的空間。

因此,我們大約有1200個字節。 不完全正確。 大小可能有點被低估了,因為我沒有考慮到並不是每個位深度都可以在行索引中編碼。 同樣,此大小可能會被高估,因為為編碼器假定的唯一啟發式方法是計算前9個值的比特深度,並僅在此比特深度需要增加2個以上比特時才限制塊大小。 當然,編碼器可能比這更聰明。

解碼速度也可以提高。 如果將第9位從原始值移到行索引,則索引的每個元素正好是8位。 這允許從字節組開始比特流,與普通比特流的訪問器相比,可以使用更快的方法對每個字節進行解碼。 出於相同的目的,可以將原始值的剩余8位移動到行索引之后。 或者,也可以將它們包含在每個索引條目中,以使索引由16位值組成。 在進行這些修改之后,位流僅包含可變長度的數據字段。

1049字節

我注意到大多數運行都是線性的。 這就是為什么我決定不對增量值進行編碼,而是對增量值進行編碼的原因。 將其視為二階導數。 這使我大部分時間都存儲值-1、0和1,但有一些例外。

其次,我將數據一維化。 將其轉換為2維很容易,但是將其轉換為1維可以使壓縮跨幾行。

壓縮數據按大小不同的塊組織。 每個塊都以標題開頭:

  • 9位-絕對值, input[x]的值
  • 7位-差異, input[x+1]-input[x]
  • 7位-相差, input[x+2]-input[x+1]
  • 9位-二階導數的后續數據長度
  • 每個2位-二階導數的數組

盡管只能存儲值-2,-1、0和1,但此示例中的二階導數的行程非常長。

在以下代碼中,我提供了完整的可編譯代碼。 它包含:

  • C(GCC)代碼。 沒有C ++構造。
  • 您提供的輸入數組
  • 可視化功能可打印數組的內容
  • 壓縮功能(如果您的輸入有所變化)
  • Getter函數-從數組中獲取元素
  • 在主要功能中:我壓縮,解壓縮並執行檢查

玩得開心!

#include <stdlib.h>
#include <stdio.h>
#include <string.h>

typedef int16_t Arr[37][73];
typedef int16_t ArrFlat[37*73];
typedef int16_t* ArrPtr;

Arr input = { {150,145,140,135,130,125,120,115,110,105,100,95,90,85,80,75,70,65,60,55,50,45,40,35,30,25,20,15,10,5,0,-4,-9,-14,-19,-24,-29,-34,-39,-44,-49,-54,-59,-64,-69,-74,-79,-84,-89,-94,-99,104,109,114,119,124,129,134,139,144,149,154,159,164,169,174,179,175,170,165,160,155,150}, \
{143,137,131,126,120,115,110,105,100,95,90,85,80,75,71,66,62,57,53,48,44,39,35,31,27,22,18,14,9,5,1,-3,-7,-11,-16,-20,-25,-29,-34,-38,-43,-47,-52,-57,-61,-66,-71,-76,-81,-86,-91,-96,101,107,112,117,123,128,134,140,146,151,157,163,169,175,178,172,166,160,154,148,143}, \
{130,124,118,112,107,101,96,92,87,82,78,74,70,65,61,57,54,50,46,42,38,34,31,27,23,19,16,12,8,4,1,-2,-6,-10,-14,-18,-22,-26,-30,-34,-38,-43,-47,-51,-56,-61,-65,-70,-75,-79,-84,-89,-94,100,105,111,116,122,128,135,141,148,155,162,170,177,174,166,159,151,144,137,130}, \
{111,104,99,94,89,85,81,77,73,70,66,63,60,56,53,50,46,43,40,36,33,30,26,23,20,16,13,10,6,3,0,-3,-6,-9,-13,-16,-20,-24,-28,-32,-36,-40,-44,-48,-52,-57,-61,-65,-70,-74,-79,-84,-88,-93,-98,103,109,115,121,128,135,143,152,162,172,176,165,154,144,134,125,118,111}, \
{85,81,77,74,71,68,65,63,60,58,56,53,51,49,46,43,41,38,35,32,29,26,23,19,16,13,10,7,4,1,-1,-3,-6,-9,-13,-16,-19,-23,-26,-30,-34,-38,-42,-46,-50,-54,-58,-62,-66,-70,-74,-78,-83,-87,-91,-95,100,105,110,117,124,133,144,159,178,160,141,125,112,103,96,90,85}, \
{62,60,58,57,55,54,52,51,50,48,47,46,44,42,41,39,36,34,31,28,25,22,19,16,13,10,7,4,2,0,-3,-5,-8,-10,-13,-16,-19,-22,-26,-29,-33,-37,-41,-45,-49,-53,-56,-60,-64,-67,-70,-74,-77,-80,-83,-86,-89,-91,-94,-97,101,105,111,130,109,84,77,74,71,68,66,64,62}, \
{46,46,45,44,44,43,42,42,41,41,40,39,38,37,36,35,33,31,28,26,23,20,16,13,10,7,4,1,-1,-3,-5,-7,-9,-12,-14,-16,-19,-22,-26,-29,-33,-36,-40,-44,-48,-51,-55,-58,-61,-64,-66,-68,-71,-72,-74,-74,-75,-74,-72,-68,-61,-48,-25,2,22,33,40,43,45,46,47,46,46}, \
{36,36,36,36,36,35,35,35,35,34,34,34,34,33,32,31,30,28,26,23,20,17,14,10,6,3,0,-2,-4,-7,-9,-10,-12,-14,-15,-17,-20,-23,-26,-29,-32,-36,-40,-43,-47,-50,-53,-56,-58,-60,-62,-63,-64,-64,-63,-62,-59,-55,-49,-41,-30,-17,-4,6,15,22,27,31,33,34,35,36,36}, \
{30,30,30,30,30,30,30,29,29,29,29,29,29,29,29,28,27,26,24,21,18,15,11,7,3,0,-3,-6,-9,-11,-12,-14,-15,-16,-17,-19,-21,-23,-26,-29,-32,-35,-39,-42,-45,-48,-51,-53,-55,-56,-57,-57,-56,-55,-53,-49,-44,-38,-31,-23,-14,-6,0,7,13,17,21,24,26,27,29,29,30}, \
{25,25,26,26,26,25,25,25,25,25,25,25,25,26,25,25,24,23,21,19,16,12,8,4,0,-3,-7,-10,-13,-15,-16,-17,-18,-19,-20,-21,-22,-23,-25,-28,-31,-34,-37,-40,-43,-46,-48,-49,-50,-51,-51,-50,-48,-45,-42,-37,-32,-26,-19,-13,-7,-1,3,7,11,14,17,19,21,23,24,25,25}, \
{21,22,22,22,22,22,22,22,22,22,22,22,22,22,22,22,21,20,18,16,13,9,5,1,-3,-7,-11,-14,-17,-18,-20,-21,-21,-22,-22,-22,-23,-23,-25,-27,-29,-32,-35,-37,-40,-42,-44,-45,-45,-45,-44,-42,-40,-36,-32,-27,-22,-17,-12,-7,-3,0,3,7,9,12,14,16,18,19,20,21,21}, \
{18,19,19,19,19,19,19,19,19,19,19,19,19,19,19,19,18,17,16,14,10,7,2,-1,-6,-10,-14,-17,-19,-21,-22,-23,-24,-24,-24,-24,-23,-23,-23,-24,-26,-28,-30,-33,-35,-37,-38,-39,-39,-38,-36,-34,-31,-28,-24,-19,-15,-10,-6,-3,0,1,4,6,8,10,12,14,15,16,17,18,18}, \
{16,16,17,17,17,17,17,17,17,17,17,16,16,16,16,16,16,15,13,11,8,4,0,-4,-9,-13,-16,-19,-21,-23,-24,-25,-25,-25,-25,-24,-23,-21,-20,-20,-21,-22,-24,-26,-28,-30,-31,-32,-31,-30,-29,-27,-24,-21,-17,-13,-9,-6,-3,-1,0,2,4,5,7,9,10,12,13,14,15,16,16}, \
{14,14,14,15,15,15,15,15,15,15,14,14,14,14,14,14,13,12,11,9,5,2,-2,-6,-11,-15,-18,-21,-23,-24,-25,-25,-25,-25,-24,-22,-21,-18,-16,-15,-15,-15,-17,-19,-21,-22,-24,-24,-24,-23,-22,-20,-18,-15,-12,-9,-5,-3,-1,0,1,2,4,5,6,8,9,10,11,12,13,14,14}, \
{12,13,13,13,13,13,13,13,13,13,13,13,12,12,12,12,11,10,9,6,3,0,-4,-8,-12,-16,-19,-21,-23,-24,-24,-24,-24,-23,-22,-20,-17,-15,-12,-10,-9,-9,-10,-12,-13,-15,-17,-17,-18,-17,-16,-15,-13,-11,-8,-5,-3,-1,0,1,1,2,3,4,6,7,8,9,10,11,12,12,12}, \
{11,11,11,11,11,12,12,12,12,12,11,11,11,11,11,10,10,9,7,5,2,-1,-5,-9,-13,-17,-20,-22,-23,-23,-23,-23,-22,-20,-18,-16,-14,-11,-9,-6,-5,-4,-5,-6,-8,-9,-11,-12,-12,-12,-12,-11,-9,-8,-6,-3,-1,0,0,1,1,2,3,4,5,6,7,8,9,10,11,11,11}, \
{10,10,10,10,10,10,10,10,10,10,10,10,10,10,9,9,9,7,6,3,0,-3,-6,-10,-14,-17,-20,-21,-22,-22,-22,-21,-19,-17,-15,-13,-10,-8,-6,-4,-2,-2,-2,-2,-4,-5,-7,-8,-8,-9,-8,-8,-7,-5,-4,-2,0,0,1,1,1,2,2,3,4,5,6,7,8,9,10,10,10}, \
{9,9,9,9,9,9,9,10,10,9,9,9,9,9,9,8,8,6,5,2,0,-4,-7,-11,-15,-17,-19,-21,-21,-21,-20,-18,-16,-14,-12,-10,-8,-6,-4,-2,-1,0,0,0,-1,-2,-4,-5,-5,-6,-6,-5,-5,-4,-3,-1,0,0,1,1,1,1,2,3,3,5,6,7,8,8,9,9,9}, \
{9,9,9,9,9,9,9,9,9,9,9,9,8,8,8,8,7,5,4,1,-1,-5,-8,-12,-15,-17,-19,-20,-20,-19,-18,-16,-14,-11,-9,-7,-5,-4,-2,-1,0,0,1,1,0,0,-2,-3,-3,-4,-4,-4,-3,-3,-2,-1,0,0,0,0,0,1,1,2,3,4,5,6,7,8,8,9,9}, \
{9,9,9,8,8,8,9,9,9,9,9,8,8,8,8,7,6,5,3,0,-2,-5,-9,-12,-15,-17,-18,-19,-19,-18,-16,-14,-12,-9,-7,-5,-4,-2,-1,0,0,1,1,1,1,0,0,-1,-2,-2,-3,-3,-2,-2,-1,-1,0,0,0,0,0,0,0,1,2,3,4,5,6,7,8,8,9}, \
{8,8,8,8,8,8,9,9,9,9,9,9,8,8,8,7,6,4,2,0,-3,-6,-9,-12,-15,-17,-18,-18,-17,-16,-14,-12,-10,-8,-6,-4,-2,-1,0,0,1,2,2,2,2,1,0,0,-1,-1,-1,-2,-2,-1,-1,0,0,0,0,0,0,0,0,0,1,2,3,4,5,6,7,8,8}, \
{8,8,8,8,9,9,9,9,9,9,9,9,9,8,8,7,5,3,1,-1,-4,-7,-10,-13,-15,-16,-17,-17,-16,-15,-13,-11,-9,-6,-5,-3,-2,0,0,0,1,2,2,2,2,1,1,0,0,0,-1,-1,-1,-1,-1,0,0,0,0,-1,-1,-1,-1,-1,0,0,1,3,4,5,7,7,8}, \
{8,8,9,9,9,9,10,10,10,10,10,10,10,9,8,7,5,3,0,-2,-5,-8,-11,-13,-15,-16,-16,-16,-15,-13,-12,-10,-8,-6,-4,-2,-1,0,0,1,2,2,3,3,2,2,1,0,0,0,0,0,0,0,0,0,0,-1,-1,-2,-2,-2,-2,-2,-1,0,0,1,3,4,6,7,8}, \
{7,8,9,9,9,10,10,11,11,11,11,11,10,10,9,7,5,3,0,-2,-6,-9,-11,-13,-15,-16,-16,-15,-14,-13,-11,-9,-7,-5,-3,-2,0,0,1,1,2,3,3,3,3,2,2,1,1,0,0,0,0,0,0,0,-1,-1,-2,-3,-3,-4,-4,-4,-3,-2,-1,0,1,3,5,6,7}, \
{6,8,9,9,10,11,11,12,12,12,12,12,11,11,9,7,5,2,0,-3,-7,-10,-12,-14,-15,-16,-15,-15,-13,-12,-10,-8,-7,-5,-3,-1,0,0,1,2,2,3,3,4,3,3,3,2,2,1,1,1,0,0,0,0,-1,-2,-3,-4,-4,-5,-5,-5,-5,-4,-2,-1,0,2,3,5,6}, \
{6,7,8,10,11,12,12,13,13,14,14,13,13,11,10,8,5,2,0,-4,-8,-11,-13,-15,-16,-16,-16,-15,-13,-12,-10,-8,-6,-5,-3,-1,0,0,1,2,3,3,4,4,4,4,4,3,3,3,2,2,1,1,0,0,-1,-2,-3,-5,-6,-7,-7,-7,-6,-5,-4,-3,-1,0,2,4,6}, \
{5,7,8,10,11,12,13,14,15,15,15,14,14,12,11,8,5,2,-1,-5,-9,-12,-14,-16,-17,-17,-16,-15,-14,-12,-11,-9,-7,-5,-3,-1,0,0,1,2,3,4,4,5,5,5,5,5,5,4,4,3,3,2,1,0,-1,-2,-4,-6,-7,-8,-8,-8,-8,-7,-6,-4,-2,0,1,3,5}, \
{4,6,8,10,12,13,14,15,16,16,16,16,15,13,11,9,5,2,-2,-6,-10,-13,-16,-17,-18,-18,-17,-16,-15,-13,-11,-9,-7,-5,-4,-2,0,0,1,3,3,4,5,6,6,7,7,7,7,7,6,5,4,3,2,0,-1,-3,-5,-7,-8,-9,-10,-10,-10,-9,-7,-5,-4,-1,0,2,4}, \
{4,6,8,10,12,14,15,16,17,18,18,17,16,15,12,9,5,1,-3,-8,-12,-15,-18,-19,-20,-20,-19,-18,-16,-15,-13,-11,-8,-6,-4,-2,-1,0,1,3,4,5,6,7,8,9,9,9,9,9,9,8,7,5,3,1,-1,-3,-6,-8,-10,-11,-12,-12,-11,-10,-9,-7,-5,-2,0,1,4}, \
{4,6,8,11,13,15,16,18,19,19,19,19,18,16,13,10,5,0,-5,-10,-15,-18,-21,-22,-23,-22,-22,-20,-18,-17,-14,-12,-10,-8,-5,-3,-1,0,1,3,5,6,8,9,10,11,12,12,13,12,12,11,9,7,5,2,0,-3,-6,-9,-11,-12,-13,-13,-12,-11,-10,-8,-6,-3,-1,1,4}, \
{3,6,9,11,14,16,17,19,20,21,21,21,19,17,14,10,4,-1,-8,-14,-19,-22,-25,-26,-26,-26,-25,-23,-21,-19,-17,-14,-12,-9,-7,-4,-2,0,1,3,5,7,9,11,13,14,15,16,16,16,16,15,13,10,7,4,0,-3,-7,-10,-12,-14,-15,-14,-14,-12,-11,-9,-6,-4,-1,1,3}, \
{4,6,9,12,14,17,19,21,22,23,23,23,21,19,15,9,2,-5,-13,-20,-25,-28,-30,-31,-31,-30,-29,-27,-25,-22,-20,-17,-14,-11,-9,-6,-3,0,1,4,6,9,11,13,15,17,19,20,21,21,21,20,18,15,11,6,2,-2,-7,-11,-13,-15,-16,-16,-15,-13,-11,-9,-7,-4,-1,1,4}, \
{4,7,10,13,15,18,20,22,24,25,25,25,23,20,15,7,-2,-12,-22,-29,-34,-37,-38,-38,-37,-36,-34,-31,-29,-26,-23,-20,-17,-13,-10,-7,-4,-1,2,5,8,11,13,16,18,21,23,24,26,26,26,26,24,21,17,12,5,0,-6,-10,-14,-16,-16,-16,-15,-14,-12,-10,-7,-4,-1,1,4}, \
{4,7,10,13,16,19,22,24,26,27,27,26,24,19,11,-1,-15,-28,-37,-43,-46,-47,-47,-45,-44,-41,-39,-36,-32,-29,-26,-22,-19,-15,-11,-8,-4,-1,2,5,9,12,15,19,22,24,27,29,31,33,33,33,32,30,26,21,14,6,0,-6,-11,-14,-15,-16,-15,-14,-12,-9,-7,-4,-1,1,4}, \
{6,9,12,15,18,21,23,25,27,28,27,24,17,4,-14,-34,-49,-56,-60,-60,-60,-58,-56,-53,-50,-47,-43,-40,-36,-32,-28,-25,-21,-17,-13,-9,-5,-1,2,6,10,14,17,21,24,28,31,34,37,39,41,42,43,43,41,38,33,25,17,8,0,-4,-8,-10,-10,-10,-8,-7,-4,-2,0,3,6}, \
{22,24,26,28,30,32,33,31,23,-18,-81,-96,-99,-98,-95,-93,-89,-86,-82,-78,-74,-70,-66,-62,-57,-53,-49,-44,-40,-36,-32,-27,-23,-19,-14,-10,-6,-1,2,6,10,15,19,23,27,31,35,38,42,45,49,52,55,57,60,61,63,63,62,61,57,53,47,40,33,28,23,21,19,19,19,20,22}, \
{168,173,178,176,171,166,161,156,151,146,141,136,131,126,121,116,111,106,101,-96,-91,-86,-81,-76,-71,-66,-61,-56,-51,-46,-41,-36,-31,-26,-21,-16,-11,-6,-1,3,8,13,18,23,28,33,38,43,48,53,58,63,68,73,78,83,88,93,98,103,108,113,118,123,128,133,138,143,148,153,158,163,168} };

void visual(Arr arr) {
  int row;
  int col;
  for (row=0; row<37; ++row) {
    for (col=0; col<73; ++col)
      printf("%3d",arr[row][col]);
    printf("\n");
  }
}

void visualFlat(ArrFlat arr) {
  int cell;
  for (cell=0; cell<37*73; ++cell) {
    printf("%3d",arr[cell]);
  }
  printf("\n");
}

typedef struct {
  int16_t absolute:9;
  int16_t adiff:7;
  int16_t diff:7;
  unsigned short diff2_length:9;
} __attribute__((packed)) Header;

typedef union {
  struct {
  int16_t diff2_a:2;
  int16_t diff2_b:2;
  int16_t diff2_c:2;
  int16_t diff2_d:2;
  } __attribute__((packed));
  unsigned char all;
} Chunk;

int16_t chunkGet(Chunk k, int16_t offset) {
  switch (offset) {
    case 0 : return k.diff2_a;
    case 1 : return k.diff2_b;
    case 2 : return k.diff2_c;
    case 3 : return k.diff2_d;
  }
}

void chunkSet(Chunk *k, int16_t offset, int16_t value) {
  switch (offset) {
    case 0 : k->diff2_a=value; break;
    case 1 : k->diff2_b=value; break;
    case 2 : k->diff2_c=value; break;
    case 3 : k->diff2_d=value; break;
    default: printf("Invalid offset %hd\n", offset);
  }
}

unsigned char data[1049];

void compress (ArrFlat src) {
  Chunk diffData;
  int16_t headerIdx=0;
  int16_t diffIdx;
  int16_t currentDiffValue;
  int16_t length=-3;
  int16_t shift=0;
  Header h;
  int16_t position=0;
  while (position<37*73) {
    if (length==-3) { //encode the absolute value
      h.absolute=currentDiffValue=src[position];
      ++position;
      ++length;
      continue;
    }
    if (length==-2) { //encode the first diff value
      h.adiff=currentDiffValue=src[position]-src[position-1];
      if (currentDiffValue<-64 || currentDiffValue>+63)
        printf("\nDIFF TOO BIG\n");
      ++position;
      ++length;
      continue;
    }
    if (length==-1) { //encode the second diff value
      h.diff=currentDiffValue=src[position]-src[position-1];
      if (currentDiffValue<-64 || currentDiffValue>+63)
        printf("\nDIFF TOO BIG\n");
      ++position;
      ++length;
      diffData.all=0;
      diffIdx=headerIdx+sizeof(Header);
      shift=0;
      continue;
    }
    //compute the diff2
    int16_t diff=src[position]-src[position-1];
    int16_t diff2=diff-currentDiffValue;
    if (diff2>1 || diff2<-2) { //big change - restart with header
      if (length>511)
        printf("\nLENGTH TOO LONG\n");
      if (shift!=0) { //store partial byte
        data[diffIdx]=diffData.all;
        diffData.all=0;
        ++diffIdx;
      }
      h.diff2_length=length;
      memcpy(data+headerIdx,&h,sizeof(Header));
      headerIdx=diffIdx;
      length=-3;
      continue;
    }
    chunkSet(&diffData,shift,diff2);
    shift+=1;
    currentDiffValue=diff;
    ++position;
    ++length;
    if (shift==4) {
      data[diffIdx]=diffData.all;
      diffData.all=0;
      ++diffIdx;
      shift=0;
    }
  }
  if (shift!=0) { //finalize
    data[diffIdx]=diffData.all;
    ++diffIdx;
  }
  h.diff2_length=length;
  memcpy(data+headerIdx,&h,sizeof(Header));
  headerIdx=diffIdx;
  printf("Ending byte=%hd\n",headerIdx);
}

int16_t get(int row, int col) {
  int idx=row*73+col;
  int dataIdx=0;
  int pos=0;
  int16_t absolute;
  int16_t diff;
  Header h;
  while (1) {
    memcpy(&h, data+dataIdx, sizeof(Header));
    if (idx==pos) return h.absolute;
    absolute=h.absolute+h.adiff;
    if (idx==pos+1) return absolute;
    diff=h.diff;
    absolute+=diff;
    if (idx==pos+2) return absolute;
    dataIdx+=sizeof(Header);
    pos+=3;
    if (pos+h.diff2_length <= idx) {
      pos+=h.diff2_length;
      dataIdx+=(h.diff2_length+3)/4;
    } else break;
  }
  int shift=4;
  Chunk diffData;
  while (pos<=idx) {
    if (shift==4) {
      diffData.all=data[dataIdx];
      ++dataIdx;
      shift=0;
    }
    diff+=chunkGet(diffData,shift);
    absolute+=diff;
    ++shift;
    ++pos;
  }
  return absolute;
}

int main() {
  printf("Input:\n");
  visual(input);
  int row;
  int col;
  ArrPtr flatInput=(ArrPtr)input;
  printf("sizeof(Header)=%lu\n",sizeof(Header));
  printf("sizeof(Chunk)=%lu\n",sizeof(Chunk));
  compress(flatInput);
  ArrFlat re;
  for (row=0; row<37; ++row)
    for (col=0; col<73; ++col) {
      int cell=row*73+col;
      re[cell]=get(row,col);
      if (re[cell]!=flatInput[cell])
        printf("ERROR DETECTED IN CELL %d\n",cell);
    }
  visual(re);
  return 0;
}

一個Visual Studio版本(與VS 2010一起編譯)

#include <stdlib.h>
#include <stdint.h>
#include <stdio.h>
#include <string.h>

typedef int16_t Arr[37][73];
typedef int16_t ArrFlat[37*73];
typedef int16_t* ArrPtr;

Arr input = { [... your array as above ...] };

void visual(Arr arr) {
    int row;
    int col;
    for (row=0; row<37; ++row) {
        for (col=0; col<73; ++col)
            printf("%3d",arr[row][col]);
        printf("\n");
    }
}

void visualFlat(ArrFlat arr) {
    int cell;
    for (cell=0; cell<37*73; ++cell) {
        printf("%3d",arr[cell]);
    }
    printf("\n");
}

#pragma pack(1)
typedef struct {
    int16_t absolute:9;
    int16_t adiff:7;
    int16_t diff:7;
    unsigned short diff2_length:9;
} Header;

#pragma pack(1)
typedef union {
    struct {
        char diff2_a:2;
        char diff2_b:2;
        char diff2_c:2;
        char diff2_d:2;
    };
    unsigned char all;
} Chunk;

int16_t chunkGet(Chunk k, int16_t offset) {
    switch (offset) {
    case 0 : return k.diff2_a;
    case 1 : return k.diff2_b;
    case 2 : return k.diff2_c;
    case 3 : return k.diff2_d;
    }
}

void chunkSet(Chunk *k, int16_t offset, int16_t value) {
    switch (offset) {
    case 0 : k->diff2_a=value; break;
    case 1 : k->diff2_b=value; break;
    case 2 : k->diff2_c=value; break;
    case 3 : k->diff2_d=value; break;
    default: printf("Invalid offset %hd\n", offset);
    }
}

unsigned char data[1049];

void compress (ArrFlat src) {
    Chunk diffData;
    int16_t headerIdx=0;
    int16_t diffIdx;
    int16_t currentDiffValue;
    int16_t length=-3;
    int16_t shift=0;
    int16_t diff;
    int16_t diff2;
    Header h;
    int16_t position=0;
    while (position<37*73) {
        if (length==-3) { //encode the absolute value
            h.absolute=currentDiffValue=src[position];
            ++position;
            ++length;
            continue;
        }
        if (length==-2) { //encode the first diff value
            h.adiff=currentDiffValue=src[position]-src[position-1];
            if (currentDiffValue<-64 || currentDiffValue>+63)
                printf("\nDIFF TOO BIG\n");
            ++position;
            ++length;
            continue;
        }
        if (length==-1) { //encode the second diff value
            h.diff=currentDiffValue=src[position]-src[position-1];
            if (currentDiffValue<-64 || currentDiffValue>+63)
                printf("\nDIFF TOO BIG\n");
            ++position;
            ++length;
            diffData.all=0;
            diffIdx=headerIdx+sizeof(Header);
            shift=0;
            continue;
        }
        //compute the diff2
        diff=src[position]-src[position-1];
        diff2=diff-currentDiffValue;
        if (diff2>1 || diff2<-2) { //big change - restart with header
            if (length>511)
                printf("\nLENGTH TOO LONG\n");
            if (shift!=0) { //store partial byte
                data[diffIdx]=diffData.all;
                diffData.all=0;
                ++diffIdx;
            }
            h.diff2_length=length;
            memcpy(data+headerIdx,&h,sizeof(Header));
            headerIdx=diffIdx;
            length=-3;
            continue;
        }
        chunkSet(&diffData,shift,diff2);
        shift+=1;
        currentDiffValue=diff;
        ++position;
        ++length;
        if (shift==4) {
            data[diffIdx]=diffData.all;
            diffData.all=0;
            ++diffIdx;
            shift=0;
        }
    }
    if (shift!=0) { //finalize
        data[diffIdx]=diffData.all;
        ++diffIdx;
    }
    h.diff2_length=length;
    memcpy(data+headerIdx,&h,sizeof(Header));
    headerIdx=diffIdx;
    printf("Ending byte=%hd\n",headerIdx);
}

int16_t get(int row, int col) {
    int idx=row*73+col;
    int dataIdx=0;
    int pos=0;
    int16_t absolute;
    int16_t diff;
    int shift;
    Header h;
    Chunk diffData;
    while (1) {
        memcpy(&h, data+dataIdx, sizeof(Header));
        if (idx==pos) return h.absolute;
        absolute=h.absolute+h.adiff;
        if (idx==pos+1) return absolute;
        diff=h.diff;
        absolute+=diff;
        if (idx==pos+2) return absolute;
        dataIdx+=sizeof(Header);
        pos+=3;
        if (pos+h.diff2_length <= idx) {
            pos+=h.diff2_length;
            dataIdx+=(h.diff2_length+3)/4;
        } else break;
    }
    shift=4;

    while (pos<=idx) {
        if (shift==4) {
            diffData.all=data[dataIdx];
            ++dataIdx;
            shift=0;
        }
        diff+=chunkGet(diffData,shift);
        absolute+=diff;
        ++shift;
        ++pos;
    }
    return absolute;
}

int main() {
    int row;
    int col;
    ArrPtr flatInput=(ArrPtr)input;
    ArrFlat re;

    printf("Input:\n");
    visual(input);
    printf("sizeof(Header)=%lu\n",sizeof(Header));
    printf("sizeof(Chunk)=%lu\n",sizeof(Chunk));
    compress(flatInput);

    for (row=0; row<37; ++row)
        for (col=0; col<73; ++col) {
            int cell=row*73+col;
            re[cell]=get(row,col);
            if (re[cell]!=flatInput[cell])
                printf("ERROR DETECTED IN CELL %d\n",cell);
        }
        visual(re);
        return 0;
}

726字節

該算法對實際值和由先前值進行線性外推產生的值之間的差進行編碼。 換句話說,它使用一階泰勒級數,或如CygnusX1所稱的delta-of-delta。

在此外推編碼之后,大多數值都在[-1 .. 1]范圍內。 這是使用算術編碼范圍編碼的一個很好的理由。 我已經實現了Arturo San Emeterio Campos的算術編碼器。 也可以使用同一作者的范圍編碼器算法。

范圍較小的[-2 .. 2]值由算術編碼器壓縮,而較大的值打包在4位半字節中。

還有一些優化可以使它更加緊湊:

  • 所有值都壓縮為一個連續流
  • 最后一列完全不編碼,因為它等於第一列
  • 在對第一列進行編碼時,歷史記錄僅部分更新以改善第二列的結果
  • 幾種情況,當值從-100跳到100時,處理方式不同

該算法很慢,它使用多達8000個32位整數除法和大量位操作來提取單個值。 但是它將數據打包到726字節的數組中,並且代碼大小不是很大。

如果正確調整了頻率表,則可以優化速度(約2800個32位整數除法)。 同樣,使用范圍編碼代替算術編碼可以提高速度。 如果算術編碼器數據和半字節都打包在字節數組中而不是uint16數組(2個字節)中,並且如果最多兩個開始的零字節與某些其他數據結構的結尾(1..2個字節)混疊,則可以優化空間。 使用二階泰勒級數並沒有獲得任何空間,但是其他外推方法可能會有所改進。

完整的實現可以在這里找到: 編碼器解碼器和測試 在GCC上測試。

還有另一種可能性:

  • 有兩個數組:一個數組和一個溢出數組。
  • 主數組的每個元素都包含7位實際數據+ 1個“狀態”位。
  • 如果狀態位被重置,則該值將適合剩余的7位。
  • 如果設置了狀態位,則該值的一部分仍在這7位中,但是其余位包含在溢出數組中。
  • 溢出數組中的索引是通過計算主數組中所有設置了狀態位的先前元素來找到的。

在此處輸入圖片說明

這具有以下優點:

  • 快速查找適合7位的值。
  • 可以處理無限范圍的值(通過在溢出數組中使用適當的大元素,或通過重復算法並在頂部堆疊另一個溢出數組等)。
  • 另一方面,如果您知道這些值將始終適合9位,請使用溢出數組中的2位元素來節省額外的空間(需要一些位調整,但是可以做到)。
  • 對於某些數據分布,當大多數值都適合7位時,它可能比僅使用9位元素(在單個數組中或在8位數組+ 1位數組中)使用更少的空間。
  • 實施起來相當簡單,因此代碼大小不會消耗掉數據上的節省。

缺點:

  • 緩慢查找不適合7位的值。 要訪問此值,需要線性遍歷主數組中剩余的所有元素(並檢查其狀態位)以確定溢出數組中的索引。
  • 對於其他一些數據分布,它可能比9位方法使用更多的空間-當有很多值適合7位時。
  • 它不像8位數組+ 1位數組方法那樣簡單,因此雖然還不是很大,但是代碼會比這大一些。

如果代碼+數據大小的總和很重要,請不要忘記檢查已編譯代碼的大小。 這是一個對數據使用常規8位編碼(增益為50%)並針對代碼大小進行優化的示例。

我們將為每行存儲8位值:

    unsigned char *row_data = compressed_data[row*73];
    int value = row_data[column];

對於第一行,將它們分成兩部分。 第一個值將直接編碼。 下一部分將使用第一個值的負增量。 第二部分將被編碼為100的正增量。

    if (row <= 4) {
        char break = break_point[row];
        if (column >= break) return 100 + value;
        if (column == 0) return value;
        return row_data[0] - value;
    }

break_point將是前五行中104、101、100、103、110的位置。 我沒有檢查它是否可以計算而不是存儲。 是51+行嗎?

在第5行之后,值變得更平滑,我們可以將它們存儲在8位二進制補碼中。 例外是最后一行。

    if (row != 36) return (signed char) value;

最后一行可以這樣編碼,不包含任何數據(節省73個字節):

    value = 168+5*column;
    if (value <= 178) return value;
    value = 359 - x; /* 359 = 176 + 183 */
    if (value >= 101) return value;
    value = -x;
    if (value > 0) x--;
    return value;

這將需要大約2640字節,但是訪問起來非常快速且緊湊。

第一行可以與最后一行類似地進行編碼(在-5處增加增量,在-104處改變符號,在184位處進行359-x“翻轉”),以節省一些代碼大小的代價節省了70個字節的數據。

如果重復項是連續的,並且您有額外的CPU,則可以使用游程長度編碼。

遺憾的是,該數據集對於DFA來說看起來太密集了……但是您完全可以使用一個數據集。 這將需要預處理並且非常快。 程序集可能超出了4K數據集,因此可能不是一個選擇。

假設您的16位值很少出現,那么散列可能適用於超大條目(請參閱:google sparsehash)...每個實體有1位以上的開銷。

您還可以使用9位值並手動管理內存字節邊界,這與單獨的位數組的開銷相同……也許更多。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM