简体   繁体   English

C hack用于存储占用1位空间的位?

[英]C hack for storing a bit that takes 1 bit space?

I have a long list of numbers between 0 and 67600. Now I want to store them using an array that is 67600 elements long. 我在0到67600之间有一长串数字。现在我想使用长度为67600个元素的数组存储它们。 An element is set to 1 if a number was in the set and it is set to 0 if the number is not in the set. 如果数字在集合中,则元素设置为1;如果数字不在集合中,则设置为0。 ie. 即。 each time I need only 1bit information for storing the presence of a number. 每次我只需要1bit信息来存储一个数字。 Is there any hack in C/C++ that helps me achieve this? C / C ++中是否存在帮助我实现这一目标的黑客攻击?

In C++ you can use std::vector<bool> if the size is dynamic (it's a special case of std::vector , see this ) otherwise there is std::bitset (prefer std::bitset if possible.) There is also boost::dynamic_bitset if you need to set/change the size at runtime. 在C ++中你可以使用std::vector<bool>如果大小是动态的(这是std::vector的一个特例,请看这个 ),否则就有std::bitset (如果可能的话,更喜欢std::bitset 。)有如果需要在运行时设置/更改大小,还可以使用boost::dynamic_bitset You can find info on it here , it is pretty cool! 你可以在这里找到相关信息,非常酷!

In C (and C++) you can manually implement this with bitwise operators. 在C(和C ++)中,您可以使用按位运算符手动实现它。 A good summary of common operations is here . 这里是常见操作的一个很好的总结。 One thing I want to mention is its a good idea to use unsigned integers when you are doing bit operations. 我想提到的一件事是,在进行位操作时使用无符号整数是一个好主意。 << and >> are undefined when shifting negative integers. 在移动负整数时, <<>>是不确定的。 You will need to allocate arrays of some integral type like uint32_t . 您将需要分配某些整数类型的数组,如uint32_t If you want to store N bits, it will take N/32 of these uint32_t s. 如果要存储N位,则需要这些uint32_tN/32 Bit i is stored in the i % 32 'th bit of the i / 32 'th uint32_t . i被存储在i % 32 “的第比特i / 32 ”日uint32_t You may want to use a differently sized integral type depending on your architecture and other constraints. 您可能希望使用不同大小的整数类型,具体取决于您的体系结构和其他约束。 Note : prefer using an existing implementation (eg as described in the first paragraph for C++, search Google for C solutions) over rolling your own (unless you specifically want to, in which case I suggest learning more about binary/bit manipulation from elsewhere before tackling this.) This kind of thing has been done to death and there are "good" solutions. 注意 :更喜欢使用现有的实现(例如,在C ++的第一段中描述,搜索Google的C解决方案)而不是自己滚动(除非你特别想要,在这种情况下我建议从其他地方学习更多关于二进制/位操作的知识)解决这个问题。)这种事情已经完成了死亡并且有“好”的解决方案。

There are a number of tricks that will maybe only consume one bit: eg arrays of bitfields (applicable in C as well), but whether less space gets used is up to compiler. 有许多技巧可能只消耗一位:例如位域数组(也适用于C语言),但是否使用的空间是否由编译器决定。 See this link . 看到这个链接

Please note that whatever you do, you will almost surely never be able to use exactly N bits to store N bits of information - your computer very likely can't allocate less than 8 bits: if you want 7 bits you'll have to waste 1 bit, and if you want 9 you will have to take 16 bits and waste 7 of them. 请注意,无论你做什么,你几乎肯定永远不能使用正好 N位来存储N位信息 - 你的计算机很可能不能分配少于8位:如果你想要7位你就不得不浪费1位,如果你想要9,你将需要16位并浪费其中的7位。 Even if your computer (CPU + RAM etc.) could "operate" on single bits, if you're running in an OS with malloc / new it would not be sane for your allocator to track data to such a small precision due to overhead. 即使您的计算机(CPU + RAM等)可以在单个位上“操作”,如果您在具有malloc / new的操作系统中运行,由于开销,您的分配器将数据跟踪到如此小的精度是不合理的。 That last qualification was pretty silly - you won't find an architecture in use that allows you to operate on less than 8 bits at a time I imagine :) 最后一个资格非常愚蠢 - 你不会发现一个正在使用的架构允许你在我想象的时间内以低于8位的速度运行:)

You should use std::bitset . 你应该使用std::bitset

std::bitset functions like an array of bool (actually like std::array , since it copies by value), but only uses 1 bit of storage for each element. std::bitset函数类似于bool数组(实际上类似于std::array ,因为它按值复制),但每个元素只使用1位存储空间。

Another option is vector<bool> , which I don't recommend because: 另一个选项是vector<bool> ,我不建议这样做,因为:

  • It uses slower pointer indirection and heap memory to enable resizing, which you don't need. 它使用较慢的指针间接和堆内存来启用调整大小,这是您不需要的。
  • That type is often maligned by standards-purists because it claims to be a standard container, but fails to adhere to the definition of a standard container*. 这种类型通常受到标准纯粹主义者的诽谤,因为它声称是标准容器,但不遵守标准容器*的定义。

*For example, a standard-conforming function could expect &container.front() to produce a pointer to the first element of any container type, which fails with std::vector<bool> . *例如,符合标准的函数可以期望&container.front()生成指向任何容器类型的第一个元素的指针,该指针因std::vector<bool>而失败。 Perhaps a nitpick for your usage case, but still worth knowing about. 也许是你用例的挑剔,但仍然值得了解。

There is in fact! 实际上有! std::vector<bool> has a specialization for this: http://en.cppreference.com/w/cpp/container/vector_bool std::vector<bool>有一个专门化: http//en.cppreference.com/w/cpp/container/vector_bool

See the doc, it stores it as efficiently as possible. 请参阅文档,它尽可能高效地存储它。

Edit: as somebody else said, std::bitset is also available: http://en.cppreference.com/w/cpp/utility/bitset 编辑:正如其他人所说, std::bitset也可用: http//en.cppreference.com/w/cpp/utility/bitset

如果要在C中写入,请使用长度为67601位的char数组(67601/8 = 8451),然后为每个值打开/关闭相应的位。

Others have given the right idea. 其他人给出了正确的想法。 Here's my own implementation of a bitsarr , or 'array' of bits. 这是我自己的bitsarr或'数组'位的实现。 An unsigned char is one byte, so it's essentially an array of unsigned chars that stores information in individual bits. unsigned char是一个字节,因此它本质上是一个无符号字符数组,用于将信息存储在各个位中。 I added the option of storing TWO or FOUR bit values in addition to ONE bit values, because those both divide 8 (the size of a byte), and would be useful if you want to store a huge number of integers that will range from 0-3 or 0-15. 我添加了除了一位值之外还存储两个或四个位值的选项,因为它们都除以8(一个字节的大小),如果你想存储大量范围为0的整数,它会很有用。 -3或0-15。

When setting and getting, the math is done in the functions, so you can just give it an index as if it were a normal array--it knows where to look. 设置和获取时,数学在函数中完成,因此你可以给它一个索引,好像它是一个普通的数组 - 它知道在哪里看。

Also, it's the user's responsibility to not pass a value to set that's too large, or it will screw up other values. 此外,用户有责任不传递值来设置太大,否则会搞砸其他值。 It could be modified so that overflow loops back around to 0, but that would just make it more convoluted, so I decided to trust myself. 它可以被修改,以便溢出循环回到0,但这只会让它更复杂,所以我决定相信自己。

#include<stdio.h>
#include <stdlib.h>
#define BYTE 8

typedef enum {ONE=1, TWO=2, FOUR=4} numbits;

typedef struct bitsarr{
    unsigned char* buckets;
    numbits n;
} bitsarr;


bitsarr new_bitsarr(int size, numbits n)
{
    int b = sizeof(unsigned char)*BYTE;
    int numbuckets = (size*n + b - 1)/b;
    bitsarr ret;  
    ret.buckets = malloc(sizeof(ret.buckets)*numbuckets);
    ret.n = n;
    return ret;
}
void bitsarr_delete(bitsarr xp)
{
    free(xp.buckets);
}

void bitsarr_set(bitsarr *xp, int index, int value)
{
    int buckdex, innerdex;
    buckdex = index/(BYTE/xp->n);
    innerdex = index%(BYTE/xp->n);
    xp->buckets[buckdex] = (value << innerdex*xp->n) | ((~(((1 << xp->n) - 1) << innerdex*xp->n)) & xp->buckets[buckdex]);

    //longer version

    /*unsigned int width, width_in_place, zeros, old, newbits, new;
    width = (1 << xp->n) - 1; 
    width_in_place = width << innerdex*xp->n;
    zeros = ~width_in_place;
    old = xp->buckets[buckdex];
    old = old & zeros;
    newbits = value << innerdex*xp->n;
    new = newbits | old;
    xp->buckets[buckdex] = new; */

}

int bitsarr_get(bitsarr *xp, int index)
{
    int buckdex, innerdex;
    buckdex = index/(BYTE/xp->n);
    innerdex = index%(BYTE/xp->n);
    return ((((1 << xp->n) - 1) << innerdex*xp->n) & (xp->buckets[buckdex])) >> innerdex*xp->n;

    //longer version

    /*unsigned int width = (1 << xp->n) - 1; 
    unsigned int width_in_place = width << innerdex*xp->n;
    unsigned int val = xp->buckets[buckdex];
    unsigned int retshifted = width_in_place & val;
    unsigned int ret = retshifted >> innerdex*xp->n;
    return ret; */
}

int main()
{
    bitsarr x = new_bitsarr(100, FOUR);
    for(int i = 0; i<16; i++)
        bitsarr_set(&x, i, i);
    for(int i = 0; i<16; i++)
        printf("%d\n", bitsarr_get(&x, i));
    for(int i = 0; i<16; i++)
        bitsarr_set(&x, i, 15-i);
    for(int i = 0; i<16; i++)
        printf("%d\n", bitsarr_get(&x, i));
    bitsarr_delete(x);
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM