简体   繁体   中英

how to implement the string key in B+ Tree?

许多b +树示例都是使用整数键实现的,但是我也看到了一些其他一些同时使用整数键和字符串键的示例,我了解了b +树的基础,但我不了解字符串键的工作原理?

I also use a multi level B-Tree. Having a string lets say test can be seen as an array of [t,e,s,t]. Now think about a tree of trees. Each node can only hold one character for a certain position. You also need to think about a certain key /value array implementation like a growing linked list of arrays, trees or whatever. It also can make the node size dynamic (limited amount of letters).

If all keys fit the leaf, you store it in the leaf. If the leaf gets to big, you can add new nodes.

And now since each node knows its letter and position, you can strip those characters from the keys in the leaf and reconstruct them as you search or if you know the leaf + the position in the leaf.

If you now, after you have created the tree, write the tree in a certain format, you end up having string compression where you store each letter combination (prefix) only once even if it is shared by 1000ends of strings.

Simple compression often results in a 1:10 compression for normal text (in any language!) and in memory in 1:4. And also you can search for any given word (which are the strings in your dictionary you used the B+Tree for.

This is one extrem where you can use multilevel.

Databases usually use a certain prefix tree (the first x characters and store the rest in the leafs and use binary search within the leaf). Also there are implementations that use variable prefix lengths based on the actual density. So in the end it is very implementation specific and a lot of options exist.

If the tree should aid in finding the exact string. Often adding the length and using hash of lower bits of each characters do the trick. For example you could generate a hash out of length(8bit) + 4bit * 6 characters = 32Bit -> its your hash code. Or you can use the first, last and middle characters along with it. Since the length is one of the most selective you wont find many collisions while search your string.

This solution is very good for finding a particular string but destroyes the natural order of the strings so giving you no chance of answering range queries and alike. But for times where you search for a particular username / email or address those tree would be supperior (but question is why not use a hashmap).

The string key can be a pointer to a string (very likely).

Or the key could be sized to fit most strings. 64 bits holds 8 byte strings and even 16 byte keys aren't too ridiculous.

Choosing a key really depends on how you plan to use it.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM