Why binary search array is slightly faster than binary search tree?

Question

I used both functions to search queries from a very large set of data. Their speed is about the same at first, but when the size gets very large, binary search array is slightly faster. Is that because of caching effects? Array has sequentially. Does tree have so?

int binary_array_search(int array[], int length, int query){
//the array has been sorted

  int left=0, right=length-1;
  int mid;
  while(left <= right){
    mid = (left+right)/2;
    if(query == array[mid]){
      return 1;
    }
    else if(query < array[mid]){
      right = mid-1;
    }
    else{
      left = mid+1;
    }
  }
  return 0;
}

// Search a binary search tree
int binary_tree_search(bst_t *tree, int ignore, int query){
  node_t *node = tree->root;
  while(node != NULL){
    int data = node->data;
    if(query < data){
      node = node->left;
    }
    else if(query > data){
      node =node->right;
    }
    else{
      return 1;
    }
  }
  return 0;
}

Here are some results:

LENGTH   SEARCHES    binary search  array    binary search tree

 1024       10240        7.336000e-03            8.230000e-03
 2048       20480        1.478000e-02           1.727900e-02
 4096       40960        3.001100e-02           3.596800e-02
 8192       81920        6.132700e-02          7.663800e-02
 16384       163840      1.251240e-01          1.637960e-01

Answer 1

There are several reasons why an array may be and should be faster:

A node in the tree is at least 3 times bigger then an item in the array due to the left and right pointers.

For example, on a 32 bit system you'll have 12 bytes instead of 4. Chances are those 12 bytes are padded to or aligned on 16 bytes. On a 64 bit system we get 8 and 24 to 32 bytes.

This means that with an array 3 to 4 times more items can be loaded in the L1 cache.

Nodes in the tree are allocated on the heap, and those could be everywhere in memory, depending on the order they were allocated (also, the heap can get fragmented) - and creating those nodes (with new or alloc ) will also take more time compared to a possible one time allocation for the array - but this is probably not part of the speed test here.

To access a single value in the array only one read has to be done, for the tree we need two: the left or right pointer and the value.

When the lower levels of the search are reached, the items to compare will be close together in the array (and possibly already in the L1 cache) while they are probably spread in memory for the tree.

Most of the time arrays will be faster due to locality of reference .

Answer 2

Is that because of caching effects?

Sure, that is the main reason. On modern CPUs, cache is transparently used to read/write data in memory.

Cache is much faster than the main memory (DRAM). Just to give you a perspective, accessing data in Level 1 cache is ~4 CPU cycles, while accessing the DRAM on the same CPU is ~200 CPU cycles, ie 50 times faster.

Cache operate on small blocks called cache lines, which are usually 64 bytes long.

More info: https://en.wikipedia.org/wiki/CPU_cache

Array has sequentially. Does tree have so?

Array is a single block of data. Each element of an array is adjacent to its neighbors, ie:

+-------------------------------+
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
+-------------------------------+
  block of 32 bytes (8 times 4)

Each array access fetches a cache line, ie 64 bytes or 16 int values. So, for array there is a quite high probability (especially at the end of the binary search) that the next access will be within the same cache line, so no memory access will be needed.

On the other hand, tree nodes are allocated one by one:

                      +------------------------------------------------+
+------------------+  | +------------------+    +------------------+   |
| 0 | left | right | -+ | 2 | left | right | <- | 1 | left | right | <-+
+------------------+    +------------------+    +------------------+
 block 0 of 24 bytes     block 2 of 24 bytes     block 1 of 24 bytes

As we can see, to store just 3 values we used 2 times more memory than to store 8 values in an array above. So the tree structure is more sparse and statistically has less data per each 64 bytes cache line.

Also each memory allocation returns a block in memory which might not be adjacent to the previously allocated tree nodes.

Also allocator aligns each memory block to at least 8 bytes (on 64-bit CPUs), so there are some bytes wasted there. Not to mention that we need to store those left and right pointers in each node...

So each tree access, even at the very end of the sort, will need to fetch a cache line, ie slower that the array access.

So why then an array just a tad bit faster in the tests? It is due to a binary search. At the very beginning of the sort we access data quite randomly and each access is quite far from the previous access. So the array structure gets it boost just at the end of the sort.

Just for fun, try to compare linear search (ie basic search loop) in array vs binary search in tree. I bet you will be surprised with the results ;)

Why binary search array is slightly faster than binary search tree?

Question

2 answers

solution1
0 2018-04-09 23:28:10

solution2
0 2018-04-10 12:20:10

Why binary search array is slightly faster than binary search tree?

Question

2 answers

solution1 0 2018-04-09 23:28:10

solution2 0 2018-04-10 12:20:10

solution1
0 2018-04-09 23:28:10

solution2
0 2018-04-10 12:20:10