(int, const char *) as a compound key with the uthash library

Question

I would like to use the uthash library for a hash table with a pair of int and const char * as a compound key:

typedef struct entry_s {
    // This field is needed by the uthash library
    UT_hash_handle hh;

    // Values
    /* ... */

    // Compound key
    int num;
    const char *str;
} entry;

Specifically, I want the string pointed by const char * to be a part of the key. To clarify: the different values of a pointer may correspond to the same string (in the sense of strcmp() ).

The userguide shows how to implement a key that is similar to what I want with an int and char[] as a compound key:

typedef struct another_entry_s {
    // This field is needed by the uthash library
    UT_hash_handle hh;

    // Values
    /* ... */

    int str_len;

    // Compound key
    int num;
    char str[];
} another_entry;

However, the second approach (ie, (int, char[]) ) assumes the string is copied to char[] , but I would like to avoid the copying.

Also, I'm not looking for concatenating int and string pointed by const char * in order to leverage HASH_ADD_KEYPTR() and HASH_FIND_STR() convenience macros.

I cannot figure out how to use HASH_ADD() , HASH_FIND() , and other general macros, with the first approach (ie, (int, const char *) ). It looks like it is impossible to avoid copying, as by the uthash library design. Do I understand that right? Or is there a non-copy approach that I overlooked?

Answer 1

This is impossible with the design of this library (and it's impossible with any generic implementation without copying).

For any hashtable implementation, you need to apply some hashing function to some data. So you could of course write your specific implementation where the hashing function uses the bytes of an integer field and the bytes of a string some other field points to. But if your hashtable implementation is generic , your only option for a hashing function would be something similar to this:

unsigned int hash(void *data, size_t size);

The prototype doesn't have to look exactly like this, but in any case, the input is a pointer to some data (of any type) and the size of that data. So, obviously, you can't make such a function read from two different locations at once.

According to the uthash documentation , uthash solves the problem of compound keys by requiring them to consist of adjacent struct members. The data is then read from the first of these members with a size that includes all the members and padding . The documentation of the library is aware of that problem and requires that the struct must be initialized to all zero, eg with memset() , so the padding bits have defined values. If you want to use this, you must make your string a member of the structure (instead of a pointer to it).

While this probably works fine in most implementations, I personally wouldn't rely on that feature at all , because the C standard doesn't guarantee a defined value of padding after setting some member, see

C11 (draft N1570), §6.2.6.1 p6 :

When a value is stored in an object of structure or union type, including in a member object, the bytes of the object representation that correspond to any padding bytes take unspecified values. [...]

The really safe and portable way to use your compound key with this library is therefore: take a concatenated copy of the data. You could do something like this, given your struct above with one added field char *hashKey :

#define ENTRY_KEYLEN(str) (sizeof(int) + strlen(str))
#define ENTRY_GETKEY(key, e) (getEntryKey((key), (e)->num, (e)->str))

static void getEntryKey(char *key, int num, const char *str)
{
    memcpy(key, &num, sizeof num);
    memcpy(key + sizeof num, str);
}

Then you could use the uthash macros like this:

entry *entries = 0;

entry *myent;
// allocate space, fill data in myent

// store in hashtable:
char *key = malloc(ENTRY_KEYLEN(myent->str));
// check key for NULL
ENTRY_GETKEY(key, myent);
myent->hashKey = key;
HASH_ADD_KEYPTR(hh, entries, key, ENTRY_KEYLEN(myent->str), myent);

// [...]

// find in hashtable
const char *str = "foo";
int id = 42;
key = malloc(ENTRY_KEYLEN(str));
// check key for NULL
getEntryKey(key, id, str);
entry *found;
HASH_FIND(hh, entries, key, ENTRY_KEYLEN(str), found);
free(key);

You might be better off using a different generic hashtable implementation that makes your usecase a bit easier, eg by using some callback function to retrieve the hash key data.

(int, const char *) as a compound key with the uthash library

Question

1 answers

solution1
1 ACCPTED

(int, const char *) as a compound key with the uthash library

Question

1 answers

solution1 1 ACCPTED

solution1
1 ACCPTED