简体   繁体   English

用于查看算法花费多长时间的计时器表示我的二进制搜索比线性搜索花费的时间更长

[英]A timer for seeing how long an algorithm is taking is saying my binary search takes longer than a linear search

Here is the class on gist https://gist.github.com/2605302 这是关于要点的课程https://gist.github.com/2605302

I have tested it multiple times with different files and even when there is less comparisons done for binary search the time taken is ALWAYS more. 我已经使用不同的文件对它进行了多次测试,即使对二进制搜索进行的比较较少,所以花费的时间总是更多。 What's going wrong? 出了什么问题?

public static int linerSearch ( String array [], String word, long resultsArray [])
{
    int comparisons = 0;
    int pos = -1;
    //i have started the timer where the search actualy starts
    long start = System.nanoTime ();
    for (int i = 0; i < array.length; i++){
        comparisons = comparisons + 1;
        if (array [i].equals (word)){
            pos = i;
            break;
        }
    }
    long stop = System.nanoTime ();
    long total = stop - start;
    resultsArray [0] = total;
    resultsArray [1] = (long) (long) array.length;
    resultsArray [2]= (long) (long) comparisons;
    return pos;
}

Here is the next binarySearch class 这是下一个binarySearch类

public  static int binarySearch (String [] array, String word, resultsArray []) {
    int start = 0;
    int end = array.length - 1;;
    int midPt;
    int pos = -1;
    int comparisons2 = 0;
    long start2 = System.nanoTime ();
    Arrays.sort (array);
    while (start <= end) {
        midPt = (start + end) / 2;
        comparisons2 = comparisons2 + 1;
        if (array [midPt].equalsIgnoreCase (word)) {
            pos = midPt;
            break;
        }
        else if (array [midPt].compareToIgnoreCase (word) < 0) {
            start = midPt + 1;
            comparisons2 = comparisons2 + 1;
            //camparisons2 addition was added inside this elseif and other elseif as a work around for not breaking the elseif statement tree, if it has made it two the last elseif then two camparisons after the first one will have been done
        } else if (array [midPt].compareToIgnoreCase (word) > 0)  {
            comparisons2 = comparisons2 + 2;
            end = midPt - 1;
        }
    }
    long stop2 = System.nanoTime ();
    long total2 = stop2 - start2;
    resultsArray [0] = total2;
    resultsArray [1] = (long) (long) array.length;
    resultsArray [2]= (long) (long) comparisons2;
    return pos;
}

edit: I should also add that i tried it on an already previously sorted array without that line of code and it was still a longer time when it shouldn't have been 编辑:我还应该补充一点,我在一个已经排序过的数组上尝试了它,没有那行代码,它仍然是一个更长的时间,它不应该

The problem for your benchmark is that Arrays.sort(array) takes most time and yoy don't calculate it's comparisons. 您的基准测试的问题是Arrays.sort(数组)花费大部分时间并且不计算它的比较。 Linear search requires N comparisons. 线性搜索需要N次比较。 When you sort an array you spend more than N comparisons. 排序数组时,您需要花费超过N次比较。

To see that binary search is faster you should make such test: 要查看二进制搜索速度更快,您应该进行此类测试:

1) Search for different elements 1000 times with linear search 1)使用线性搜索搜索1000次不同的元素

2) Sort array once and search for different elements using binary search 1000 times 2)对数组进行一次排序,并使用二进制搜索1000次搜索不同的元素

Your benchmark is flawed, for many reasons: 您的基准存在缺陷,原因有很多:

  • we don't know the contents of the file. 我们不知道文件的内容。 If the searched word is at the beginning, the linear search will be faster than the binary search 如果搜索到的单词位于开头,则线性搜索将比二进制搜索更快
  • the linear search compares with equals, whereas the binary search compares with equalsIgnoreCase 线性搜索与equals进行比较,而二进制搜索与equalsIgnoreCase进行比较
  • you don't execute the code a sufficient number of times to let the JIT compile the code 你没有足够多次执行代码让JIT编译代码

I haven't verified if your binary search algorithm is correct, but why don't you use the one bundled with the JDK (in the java.util.Arrays class). 我还没有验证你的二进制搜索算法是否正确,但为什么不使用与JDK捆绑的那个(在java.util.Arrays类中)。

Anyway, you don't have to measure anything. 无论如何,你不需要测量任何东西。 A binary search, in average, is faster than a linear search. 平均而言,二进制搜索比线性搜索更快。 No need to prove that again. 无需再证明这一点。

Okay, I've got this worked out for you once and for all. 好的,我已经为你一劳永逸地解决了这个问题。 First, here's the binary search method as I used it: 首先,这是我使用的二进制搜索方法:

public static int binarySearch(String[] array, String word, long resultsArray[]) {
    int start = 0;
    int end = array.length - 1;
    int midPt;
    int pos = -1;
    int comparisons2 = 0;

    //Arrays.sort(array);

    long start2 = System.nanoTime();
    while (start <= end) {
        midPt = (start + end) / 2;
        int comparisonResult = array[midPt].compareToIgnoreCase(word);
        comparisons2++;
        if (comparisonResult == 0) {
            pos = midPt;
            break;
        } else if (comparisonResult < 0) {
            start = midPt + 1;
        } else { // comparisonResult > 0
            end = midPt - 1;
        }
    }
    long stop2 = System.nanoTime();
    long total2 = stop2 - start2;

    resultsArray[0] = total2;
    resultsArray[1] = (long) array.length;
    resultsArray[2] = (long) comparisons2;
    return pos;
}

You'll notice that I reduced the number of comparisons by saving the comparison result and using that. 您会注意到我通过保存比较结果并使用它来减少比较次数。

Next, I downloaded this list of 235882 words . 接下来,我下载了这个235882字的列表 It is already sorted ignoring the case. 它已经被排序而忽略了这种情况。 Then, I built a test method that loads the contents of that file into an array and then uses both of those searching methods to find every word of that list. 然后,我构建了一个测试方法,将该文件的内容加载到一个数组中,然后使用这两种搜索方法查找该列表的每个单词。 It then averages the times and numbers of comparisons for each method separately. 然后,它分别平均每种方法的比较次数和次数。

I found out that you must be careful in choosing which comparison methods to use: if you Arrays.sort(...) a list and you use compareToIgnoreCase in binary search, it fails! 我发现你必须小心选择使用哪种比较方法: 如果你使用Arrays.sort(...)列表并在二进制搜索中使用compareToIgnoreCase ,它就会失败! By failing I mean that it cannot find the word from the given list even though the word actually exists there. 失败我的意思是它找不到给定列表中的单词,即使该单词实际存在于那里。 That is because Arrays.sort(...) is a case-sensitive sorter for Strings. 这是因为Arrays.sort(...)是一个区分大小写的字符串排序器。 If you use that, you must use the compareTo(...) method with it. 如果使用它,则必须使用compareTo(...)方法。

So, we have two cases 所以,我们有两个案例

  1. a case-insensitively sorted list and the use of compareToIgnoreCase 不区分大小写的列表和compareToIgnoreCase的使用
  2. a case-sensitively sorted list and the use of compareTo 区分大小写的列表和compareTo的使用

In addition to these options in the binary search, you also have options in the linear search: whether to use equals or equalsIgnoreCase . 除了二进制搜索中的这些选项之外,您还可以在线性搜索中使用选项:是使用equals还是使用equalsIgnoreCase I ran my test for all of these cases and compared them. 我对所有这些案例进行了测试并对它们进行了比较。 Average results: 平均结果:

  • Linear search with equals : time: 725536 ns; equals线性搜索:时间:725536 ns; comparisons: 117941; 比较:117941; time / comparison: 6.15 ns 时间/比较:6.15 ns
  • Linear search with equalsIgnoreCase : time: 1064334 ns; 使用equalsIgnoreCase线性搜索:时间:1064334 ns; comparisons: 117940; 比较:117940; time / comparison: 9.02 ns 时间/比较:9.02 ns
  • Binary search with compareToIgnoreCase : time: 1619 ns; 使用compareToIgnoreCase二进制搜索:时间:1619 ns; comparisons: 16; 比较:16; time / comparison: 101.19 ns 时间/比较:101.19 ns
  • Binary search with compareTo : time: 763 ns; 使用compareTo二进制搜索:时间:763 ns; comparisons: 16; 比较:16; time / comparison: 47.69 ns 时间/比较:47.69 ns

So, now we can clearly see your problem: the compareToIgnoreCase method takes some 16 times as much time as the equals method! 所以,现在我们可以清楚地看到你的问题: compareToIgnoreCase方法花费的时间是equals方法的16倍! Because, on average, it takes the binary search 16 comparisons to find the given word, you can perform 124 linear comparisons in that time. 因为平均而言,需要二元搜索16比较才能找到给定的单词,因此您可以在此时执行124次线性比较。 So if you test with word lists shorter than that, the linear search is, indeed, always faster than the binary search due to the different methods they are using. 因此,如果您使用比这更短的单词列表进行测试,则线性搜索确实总是比二进制搜索更快,因为它们使用的方法不同。

I actually also counted the number of words that the linear search was able to find faster than the binary search: 164 when using the compareTo method and 614 when using the compareToIgnoreCase method. 实际上,我还计算了线性搜索能够比二进制搜索更快找到的单词数:164使用compareTo方法时,使用compareToIgnoreCase方法时为614。 Of the the list of 235882 words, that's about 0.3 percent. 在235882个单词的列表中,这个数字约为0.3%。 So all in all I think it's still safe to say that the binary search is faster than the linear search. 总而言之,我认为二进制搜索比线性搜索更快仍然是安全的。 :) :)

One last point before you ask: I removed the sorting code from the binarySearch method, because that's actually an entirely different thing. 在你问之前的最后一点:我从binarySearch方法中删除了排序代码,因为这实际上是完全不同的东西。 Since you are comparing two searching algorithms, it's not fair for the other if you add the cost of a sorting algorithm to its figures. 由于您正在比较两种搜索算法,如果您将排序算法的成本添加到其数字中,则对另一种搜索算法不公平。 I posted the following as a comment in another answer already, but I'll copy it here for completeness: 我已经在另一个答案中发布了以下评论作为评论,但为了完整起见,我将在此处复制:

Binary search has the added overhead cost of sorting. 二进制搜索会增加排序的开销成本。 So if you only need to find one element from an array, linear search is always faster, because sorting takes at least O(n log n) time and then a binary search takes O(log n) time, dominated by the O(n log n) operation. 因此,如果您只需要从数组中找到一个元素,则线性搜索总是更快,因为排序至少需要O(n log n)时间,然后二进制搜索需要O(log n)时间,由O(n)控制记录n)操作。 A linear search performs in O(n) time, which is better than O(n log n). 线性搜索在O(n)时间内执行,该时间优于O(n log n)。 But once you have the array sorted, O(log n) is way better than O(n). 但是一旦你对数组进行了排序,O(log n)就好于O(n)。

If you insist on having the sorting command in the binarySearch method, you should be aware that with my setup sorting that long list of words from an initially random order takes more than 140000000 ns, or 0.14 seconds, on average. 如果你坚持在binarySearch方法中使用排序命令,你应该知道,通过我的设置排序,初始随机顺序中的长字列表平均需要超过140000000 ns或0.14秒。 In that time you could perform some 23000000 comparisons using the equals method, so you really should not use binary search if a) your array is in a random order and b) if you only ever need to find just one or a couple of elements from there. 在这段时间里,你可以执行使用一些23000000比较equals的方法,所以你如果)你的阵列是一个随机的顺序, 真的 应该使用二进制搜索b)如果你只辈子必须找个只是一个或几个元素那里。

And one more thing. 还有一件事。 In this example, where you are searching for words in a String array, the cost of accessing an item in the array is negligible because the array is saved in the fast main memory of the computer. 在此示例中,您在String数组中搜索单词时,访问数组中项目的成本可以忽略不计,因为该数组保存在计算机的快速主内存中。 But if you had, say, a huge bunch of ordered files and you tried to find something from them , then the cost of accessing a single file would make the cost of every other calculation negligible instead. 但是,如果你有,说,下令文件的一个巨大的一堆和你试图找到他们的东西,然后访问一个文件将尽一切其他计算的成本可以忽略不计,而不是成本。 So binary search would totally rock in that scenario (too). 所以二元搜索在这种情况下也会完全摇摆不定。

Your code doesn't measure the binary search, but also the sorting of the array just before doing the search. 您的代码不会测量二进制搜索,也会在搜索之前对数组进行排序。 This will always be longer than a simple linear search. 这将始终比简单的线性搜索更长。

} else if (array [midPt].compareToIgnoreCase (word) > 0)  {

You don't need this test at all. 你根本不需要这个测试。 At this point in the code there is no other possibility. 在代码的这一点上,没有其他可能性。 It isn't equal, and it isn't less than: you've already tested those; 它并不平等,它不低于:你已经测试过了; so it must be greater than. 所以它必须大于。

So you can reduce your comparisons by 33%. 因此,您可以将比较减少33%。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM