简体   繁体   中英

How to create a C struct with specific size to send over socket to DalmatinerDB?

I'm trying to create a C client for dalmatinerdb but having trouble to understand how to combine the variables, write it to a buffer and send it to the database. The fact that dalmatinerdb is written in Erlang makes it more difficult. However, by looking at a python client for dalmatinerdb i have (probably) found the necessary variable sizes and order.

The erlang client has a function called "encode", see below:

encode({stream, Bucket, Delay}) when
      is_binary(Bucket), byte_size(Bucket) > 0,
      is_integer(Delay), Delay > 0, Delay < 256->
    <<?STREAM,
  Delay:?DELAY_SIZE/?SIZE_TYPE,
  (byte_size(Bucket)):?BUCKET_SS/?SIZE_TYPE, Bucket/binary>>;

According to the official dalmatinerdb protocol we can see the following:

-define(STREAM, 4).
-define(DELAY_SIZE, 8). /bits
-define(BUCKET_SS, 8). /bits

Let's say i would like to create this kind of structure in C, would it look something like the following:

struct package {
    unsigned char[1] mode; // = "4"
    unsigned char[1] delay; // = for example "5"
    unsigned char[1] bucketNameSize; // = "5"
    unsigned char[1] bucketName; // for example "Test1"
};

Update:

I realized that the dalmatinerdb frontend (web interface) only reacts and updates when values have been sent to the bucket. With other words just sending the first struct won't give me any clue if it's right or wrong. Therefore I will try to create a secondary struct with the actual values.

The erland code snippet which encodes values looks like this:

encode({stream, Metric, Time, Points}) when
      is_binary(Metric), byte_size(Metric) > 0,
      is_binary(Points), byte_size(Points) rem ?DATA_SIZE == 0,
      is_integer(Time), Time >= 0->
    <<?SENTRY,
      Time:?TIME_SIZE/?SIZE_TYPE,
      (byte_size(Metric)):?METRIC_SS/?SIZE_TYPE, Metric/binary,
      (byte_size(Points)):?DATA_SS/?SIZE_TYPE, Points/binary>>;

The different sizes:

-define(SENTRY, 5)
-define(TIME_SIZE, 64)
-define(METRIC_SS, 16)
-define(DATA_SS, 32)

Which gives me this gives me:

<<?5,
      Time:?64/?SIZE_TYPE,
      (byte_size(Metric)):?16/?SIZE_TYPE, Metric/binary,
      (byte_size(Points)):?32/?SIZE_TYPE, Points/binary>>;

My guess is that my struct containing a value should look like this:

struct Package {
    unsigned char sentry;
    uint64_t time;
    unsigned char metricSize;
    uint16_t metric;
    unsigned char pointSize;
    uint32_t point;
};

Any comments on this structure?

The binary created by the encode function has this form:

<<?STREAM, Delay:?DELAY_SIZE/?SIZE_TYPE,
  (byte_size(Bucket)):?BUCKET_SS/?SIZE_TYPE, Bucket/binary>>

First let's replace all the preprocessor macros with their actual values:

<<4, Delay:8/unsigned-integer,
  (byte_size(Bucket):8/unsigned-integer, Bucket/binary>>

Now we can more easily see that this binary contains:

  • a byte of value 4
  • the value of Delay as a byte
  • the size of the Bucket binary as a byte
  • the value of the Bucket binary

Because of the Bucket binary at the end, the overall binary is variable-sized.

A C99 struct that resembles this value can be defined as follows:

struct EncodedStream {
    unsigned char mode;
    unsigned char delay;
    unsigned char bucket_size;
    unsigned char bucket[];
};

This approach uses a C99 flexible array member for the bucket field, since its actual size depends on the value set in the bucket_size field, and you are presumably using this structure by allocating memory large enough to hold the fixed-size fields together with the variable-sized bucket field, where bucket itself is allocated to hold bucket_size bytes. You could also replace all uses of unsigned char with uint8_t if you #include <stdint.h> . In traditional C, bucket would be defined as a 0- or 1-sized array.

Update: the OP extended the question with another struct, so I've extended my answer below to cover it too.

The obvious-but-wrong way to write a struct corresponding to the metric/time/points binary is:

struct Wrong {
    unsigned char sentry;
    uint64_t time;
    uint16_t metric_size;
    unsigned char metric[];
    uint32_t points_size;
    unsigned char points[];
};

There are two problems with the Wrong struct:

  • Padding and alignment: Normally, fields are aligned on natural boundaries corresponding to their sizes. Here, the C compiler will align the time field on an 8-byte boundary, which means there will be padding of 7 bytes following the sentry field. But the Erlang binary contains no such padding.

  • Illegal flexible array field in the middle: The metric field size can vary, but we can't use the flexible array approach for it as we did in the earlier example because such arrays can only be used for the final field of a struct. The fact that the size of metric can vary means that it's impossible to write a single C struct that matches the Erlang binary.

Solving the padding and alignment issue requires using a packed struct, which you can achieve with compiler support such as the gcc and clang __packed__ attribute (other compilers might have other ways of achieving this). The variable-sized metric field in the middle of the struct can be solved by using two structs instead:

typedef struct __attribute((__packed__)) {
    unsigned char sentry;
    uint64_t time;
    uint16_t size;
    unsigned char metric[];
} Metric;

typedef struct __attribute((__packed__)) {
    uint32_t size;
    unsigned char points[];
} Points;

Packing both structs means their layouts will match the layouts of the corresponding data in the Erlang binary.

There's still a remaining problem, though: endianness. By default, fields in an Erlang binary are big-endian. If you happen to be running your C code on a big-endian machine, then things will just work, but if not — and it's likely you're not — the data values your C code reads and writes won't match Erlang.

Fortunately, endianness is easily handled: you can use byte swapping to write C code that can portably read and write big-endian data regardless of the endianness of the host.

To use the two structs together, you'd first have to allocate enough memory to hold both structs and both the metric and the points variable-length fields. Cast the pointer to the allocated memory — let's call it p — to a Metric* , then use the Metric pointer to store appropriate values in the struct fields. Just make sure you convert the time and size values to big-endian as you store them. You can then calculate a pointer to where the Points struct is in the allocated memory as shown below, assuming p is a pointer to char or unsigned char :

Points* points = (Points*)(p + sizeof(Metric) + <length of Metric.metric>);

Note that you can't just use the size field of your Metric instance for the final addend here since you stored its value as big-endian. Then, once you fill in the fields of the Points struct, again being sure to store the size value as big-endian, you can send p over to Erlang, where it should match what the Erlang system expects.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM