简体   繁体   English

从std :: vector <unsigned char>读取二进制数据的最简单方法?

[英]Simplest way to read binary data from a std::vector<unsigned char>?

I have a lump of binary data in the form of const std::vector<unsigned char> , and want to be able to extract individual fields from that, such as 4 bytes for an integer, 1 for a boolean, etc. This needs to be, as far as possible, both efficient and simple. 我有一个const std::vector<unsigned char>形式的二进制数据块,并且希望能够从中提取单个字段,例如4个字节表示整数,1表示布尔值等。这需要尽可能高效和简单。 eg. 例如。 It should be able to read the data in place without needing to copy it (eg. into a string or array). 它应该能够在不需要复制数据的情况下读取数据(例如,将其复制到字符串或数组中)。 And it should be able to read one field at a time, like a parser, since the lump of data does not have a fixed format. 它应该能够一次读取一个字段,就像解析器一样,因为数据块没有固定的格式。 I already know how to determine what type of field to read in each case - the problem is getting a usable interface on top of an std::vector for doing this. 我已经知道如何确定在每种情况下要读取的字段类型 - 问题是在std::vector之上获得一个可用的接口来执行此操作。

However I can't find a simple way to get this data into an easily usable form that gives me useful read functionality. 但是我找不到一种简单的方法来将这些数据转换成一个易于使用的形式,从而为我提供有用的读取功能。 eg. 例如。 std::basic_istringstream<unsigned char> gives me a reading interface, but it seems like I need to copy the data into a temporary std::basic_string<unsigned char> first, which is not idea for bigger blocks of data. std::basic_istringstream<unsigned char>给了我一个读取接口,但似乎我需要先将数据复制到一个临时的std::basic_string<unsigned char> ,这对于更大的数据块是不明智的。

Maybe there is some way I can use a streambuf in this situation to read the data in place, but it would appear that I'd need to derive my own streambuf class to do that. 也许有一些方法我可以在这种情况下使用streambuf来读取数据,但看起来我需要派生自己的streambuf类来做到这一点。

It occurs to me that I can probably just use sscanf on the vector's data(), and that would seem to be both more succinct and more efficient than the C++ standard library alternatives. 在我看来,我可能只是在vector的data()上使用sscanf,这似乎比C ++标准库替代品更简洁,更有效。 EDIT: Having been reminded that sscanf doesn't do what I wrongly thought it did, I actually don't know a clean way to do this in C or C++. 编辑:有人提醒我sscanf没有做我错误的想法,我实际上不知道用C或C ++做这个的干净方法。 But am I missing something, and if so, what? 但我错过了什么,如果是的话,那是什么?

You have access to the data in a vector through its operator[] . 您可以通过其operator[]访问向量中的数据。 A vector's data is guranteed to be stored in a single contiguous array, and [] returns a reference to a member of that array. 保证向量的数据存储在单个连续数组中, []返回对该数组成员的引用。 You may use that reference directly, or through a memcpy. 您可以直接使用该引用,也可以通过memcpy使用该引用。

std::vector<unsigned char> v;
...
byteField = v[12];
memcpy(&intField, &v[13], sizeof intField);
memcpy(charArray, &v[20], lengthOfCharArray); 

EDIT 1: If you want something "more convenient" that that, you could try: 编辑1:如果你想要“更方便”的东西,你可以尝试:

template <class T>
ReadFromVector(T& t, std::size_t offset, 
  const std::vector<unsigned char>& v) {
  memcpy(&t, &v[offset], sizeof(T));
}

Usage would be: 用法是:

std::vector<unsigned char> v;
...
char c;
int i;
uint64_t ull;
ReadFromVector(c, 17, v);
ReadFromVector(i, 99, v);
ReadFromVector(ull, 43, v);

EDIT 2: 编辑2:

struct Reader {
  const std::vector<unsigned char>& v;
  std::size_t offset;
  Reader(const std::vector<unsigned char>& v) : v(v), offset() {}
  template <class T>
  Reader& operator>>(T&t) {
    memcpy(&t, &v[offset], sizeof t);
    offset += sizeof t;
    return *this;
  }
  void operator+=(int i) { offset += i };
  char *getStringPointer() { return &v[offset]; }
};

Usage: 用法:

std::vector<unsigned char> v;
Reader r(v);
int i; uint64_t ull;
r >> i >> ull;
char *companyName = r.getStringPointer();
r += strlen(companyName);

You can use a struct that describes the data you are trying to extract. 您可以使用描述要尝试提取的数据的结构。 You can move data from your vector into the struct like this: 您可以将矢量中的数据移动到结构中,如下所示:

struct MyData {
    int intVal;
    bool boolVal;
    char[15] stringVal;
} __attribute__((__packed__));

// assuming all extracted types are prefixed with a one byte indicator.
// Also assumes "vec" is your populated vector
int pos = 0;
while (pos < vec.size()-1) {
    switch(vec[pos++]) {
        case 0: { // handle int
            int intValue; 
            memcpy(&vec[pos], &intValue, sizeof(int));
            pos += sizeof(int); 
            // do something with handled value
            break;
        }
        case 1: { // handle double
            double doubleValue; 
            memcpy(&vec[pos], &doubleValue, sizeof(double));
            pos += sizeof(double); 
            // do something with handled value
            break;
        }
        case 2: { // handle MyData
            struct MyData data; 
            memcpy(&vec[pos], &data, sizeof(struct MyData));
            pos += sizeof(struct MyData); 
            // do something with handled value
            break;
        }
        default: {
            // ERROR: unknown type indicator
            break;
        }
    }
}

If your vector stores binary data, you can't use sscanf or similar, they work on text. 如果您的矢量存储二进制数据,则不能使用sscanf或类似数据,它们可以处理文本。 For converting a byte for a bool is simple enough 为bool转换一个字节很简单

bool b = my_vec[10];

For extracting an unsigned int that's stored in big endian order (assuming your ints are 32 bits): 用于提取以大端顺序存储的unsigned int(假设您的int是32位):

unsigned int i = my_vec[10] << 24 | my_vec[11] << 16 | my_vec[12] << 8 | my_vec[13];

A 16 bit unsigned short would be similar: 16位无符号短路将类似:

 unsigned short s = my_vec[10] << 8 | my_vec[11];¨

If you can afford the Qt dependency, QByteArray has the fromRawData() named constructor, which wraps existing data buffers in a QByteArray without copying the data. 如果你能负担Qt依赖关系, QByteArray有一个fromRawData()命名构造函数,它将现有的数据缓冲区包装在QByteArray中而不复制数据。 With that byte array, you can the feed a QTextStream . 使用该字节数组,您可以提供QTextStream

I'm not aware of any such function in the standard streams library (short of implementing your own streambuf , of course), but I'd love to be proved wrong :) 我不知道标准流库中有任何这样的功能(当然没有实现你自己的streambuf ),但我很想被证明是错的:)

Use a for loop to iterate over the vector and use bitwise operators to access each bit group. 使用for循环迭代向量并使用按位运算符访问每个位组。 For example, to access the upper four bits of the first usigned char in your vector: 例如,要访问向量中第一个usigned char的高四位:

int myInt = vec[0] & 0xF0;

To read the fifth bit from the right, right after the chunk we just read: 要读取右边的第五位,就在我们刚看到的块之后:

bool myBool = vec[0] & 0x08;

The three least significant (lowest) bits can be accesed like so: 可以像这样接收三个最低有效(最低)位:

int myInt2 = vec[0] & 0x07;

You can then repeat this process (using a for loop) for every element in your vector. 然后,您可以对向量中的每个元素重复此过程(使用for循环)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM