A timer for seeing how long an algorithm is taking is saying my binary search takes longer than a linear search

Question

Here is the class on gist https://gist.github.com/2605302

I have tested it multiple times with different files and even when there is less comparisons done for binary search the time taken is ALWAYS more. What's going wrong?

public static int linerSearch ( String array [], String word, long resultsArray [])
{
    int comparisons = 0;
    int pos = -1;
    //i have started the timer where the search actualy starts
    long start = System.nanoTime ();
    for (int i = 0; i < array.length; i++){
        comparisons = comparisons + 1;
        if (array [i].equals (word)){
            pos = i;
            break;
        }
    }
    long stop = System.nanoTime ();
    long total = stop - start;
    resultsArray [0] = total;
    resultsArray [1] = (long) (long) array.length;
    resultsArray [2]= (long) (long) comparisons;
    return pos;
}

Here is the next binarySearch class

public  static int binarySearch (String [] array, String word, resultsArray []) {
    int start = 0;
    int end = array.length - 1;;
    int midPt;
    int pos = -1;
    int comparisons2 = 0;
    long start2 = System.nanoTime ();
    Arrays.sort (array);
    while (start <= end) {
        midPt = (start + end) / 2;
        comparisons2 = comparisons2 + 1;
        if (array [midPt].equalsIgnoreCase (word)) {
            pos = midPt;
            break;
        }
        else if (array [midPt].compareToIgnoreCase (word) < 0) {
            start = midPt + 1;
            comparisons2 = comparisons2 + 1;
            //camparisons2 addition was added inside this elseif and other elseif as a work around for not breaking the elseif statement tree, if it has made it two the last elseif then two camparisons after the first one will have been done
        } else if (array [midPt].compareToIgnoreCase (word) > 0)  {
            comparisons2 = comparisons2 + 2;
            end = midPt - 1;
        }
    }
    long stop2 = System.nanoTime ();
    long total2 = stop2 - start2;
    resultsArray [0] = total2;
    resultsArray [1] = (long) (long) array.length;
    resultsArray [2]= (long) (long) comparisons2;
    return pos;
}

edit: I should also add that i tried it on an already previously sorted array without that line of code and it was still a longer time when it shouldn't have been

Answer 1

The problem for your benchmark is that Arrays.sort(array) takes most time and yoy don't calculate it's comparisons. Linear search requires N comparisons. When you sort an array you spend more than N comparisons.

To see that binary search is faster you should make such test:

1) Search for different elements 1000 times with linear search

2) Sort array once and search for different elements using binary search 1000 times

Answer 2

Your benchmark is flawed, for many reasons:

we don't know the contents of the file. If the searched word is at the beginning, the linear search will be faster than the binary search
the linear search compares with equals, whereas the binary search compares with equalsIgnoreCase
you don't execute the code a sufficient number of times to let the JIT compile the code

I haven't verified if your binary search algorithm is correct, but why don't you use the one bundled with the JDK (in the java.util.Arrays class).

Anyway, you don't have to measure anything. A binary search, in average, is faster than a linear search. No need to prove that again.

Answer 3

Okay, I've got this worked out for you once and for all. First, here's the binary search method as I used it:

public static int binarySearch(String[] array, String word, long resultsArray[]) {
    int start = 0;
    int end = array.length - 1;
    int midPt;
    int pos = -1;
    int comparisons2 = 0;

    //Arrays.sort(array);

    long start2 = System.nanoTime();
    while (start <= end) {
        midPt = (start + end) / 2;
        int comparisonResult = array[midPt].compareToIgnoreCase(word);
        comparisons2++;
        if (comparisonResult == 0) {
            pos = midPt;
            break;
        } else if (comparisonResult < 0) {
            start = midPt + 1;
        } else { // comparisonResult > 0
            end = midPt - 1;
        }
    }
    long stop2 = System.nanoTime();
    long total2 = stop2 - start2;

    resultsArray[0] = total2;
    resultsArray[1] = (long) array.length;
    resultsArray[2] = (long) comparisons2;
    return pos;
}

You'll notice that I reduced the number of comparisons by saving the comparison result and using that.

Next, I downloaded this list of 235882 words . It is already sorted ignoring the case. Then, I built a test method that loads the contents of that file into an array and then uses both of those searching methods to find every word of that list. It then averages the times and numbers of comparisons for each method separately.

I found out that you must be careful in choosing which comparison methods to use: if you Arrays.sort(...) a list and you use compareToIgnoreCase in binary search, it fails! By failing I mean that it cannot find the word from the given list even though the word actually exists there. That is because Arrays.sort(...) is a case-sensitive sorter for Strings. If you use that, you must use the compareTo(...) method with it.

So, we have two cases

a case-insensitively sorted list and the use of compareToIgnoreCase
a case-sensitively sorted list and the use of compareTo

In addition to these options in the binary search, you also have options in the linear search: whether to use equals or equalsIgnoreCase . I ran my test for all of these cases and compared them. Average results:

Linear search with equals : time: 725536 ns; comparisons: 117941; time / comparison: 6.15 ns
Linear search with equalsIgnoreCase : time: 1064334 ns; comparisons: 117940; time / comparison: 9.02 ns
Binary search with compareToIgnoreCase : time: 1619 ns; comparisons: 16; time / comparison: 101.19 ns
Binary search with compareTo : time: 763 ns; comparisons: 16; time / comparison: 47.69 ns

So, now we can clearly see your problem: the compareToIgnoreCase method takes some 16 times as much time as the equals method! Because, on average, it takes the binary search 16 comparisons to find the given word, you can perform 124 linear comparisons in that time. So if you test with word lists shorter than that, the linear search is, indeed, always faster than the binary search due to the different methods they are using.

I actually also counted the number of words that the linear search was able to find faster than the binary search: 164 when using the compareTo method and 614 when using the compareToIgnoreCase method. Of the the list of 235882 words, that's about 0.3 percent. So all in all I think it's still safe to say that the binary search is faster than the linear search. :)

One last point before you ask: I removed the sorting code from the binarySearch method, because that's actually an entirely different thing. Since you are comparing two searching algorithms, it's not fair for the other if you add the cost of a sorting algorithm to its figures. I posted the following as a comment in another answer already, but I'll copy it here for completeness:

Binary search has the added overhead cost of sorting. So if you only need to find one element from an array, linear search is always faster, because sorting takes at least O(n log n) time and then a binary search takes O(log n) time, dominated by the O(n log n) operation. A linear search performs in O(n) time, which is better than O(n log n). But once you have the array sorted, O(log n) is way better than O(n).

If you insist on having the sorting command in the binarySearch method, you should be aware that with my setup sorting that long list of words from an initially random order takes more than 140000000 ns, or 0.14 seconds, on average. In that time you could perform some 23000000 comparisons using the equals method, so you really should not use binary search if a) your array is in a random order and b) if you only ever need to find just one or a couple of elements from there.

And one more thing. In this example, where you are searching for words in a String array, the cost of accessing an item in the array is negligible because the array is saved in the fast main memory of the computer. But if you had, say, a huge bunch of ordered files and you tried to find something from them , then the cost of accessing a single file would make the cost of every other calculation negligible instead. So binary search would totally rock in that scenario (too).

Answer 4

Your code doesn't measure the binary search, but also the sorting of the array just before doing the search. This will always be longer than a simple linear search.

Answer 5

} else if (array [midPt].compareToIgnoreCase (word) > 0)  {

You don't need this test at all. At this point in the code there is no other possibility. It isn't equal, and it isn't less than: you've already tested those; so it must be greater than.

So you can reduce your comparisons by 33%.

A timer for seeing how long an algorithm is taking is saying my binary search takes longer than a linear search

Question

5 answers

solution1
2 2012-05-05 20:36:44

solution2
1 2012-05-05 20:35:39

solution3
1 ACCPTED 2012-05-06 11:31:24

solution4
0 2012-05-05 20:35:57

solution5
0 2012-05-06 00:30:37

A timer for seeing how long an algorithm is taking is saying my binary search takes longer than a linear search

Question

5 answers

solution1 2 2012-05-05 20:36:44

solution2 1 2012-05-05 20:35:39

solution3 1 ACCPTED 2012-05-06 11:31:24

solution4 0 2012-05-05 20:35:57

solution5 0 2012-05-06 00:30:37

solution1
2 2012-05-05 20:36:44

solution2
1 2012-05-05 20:35:39

solution3
1 ACCPTED 2012-05-06 11:31:24

solution4
0 2012-05-05 20:35:57

solution5
0 2012-05-06 00:30:37