简体   繁体   English

这些“ for”循环的更有效替代方案?

[英]More efficient alternative to these “for” loops?

I'm taking an introductory course to Java and one of my latest projects involve making sure an array doesn't contain any duplicate elements (has distinct elements). 我正在学习Java入门课程,而我的最新项目之一是确保数组不包含任何重复的元素(具有不同的元素)。 I used a for loop with an inner for loop, and it works, but I've heard that you should try to avoid using many iterations in a program (and other methods in my classes have a fair number of iterations as well). 我使用了一个for循环和一个内部for循环,并且它起作用了,但是我听说您应该避免在程序中使用许多迭代(并且我类中的其他方法也有很多迭代)。 Is there any efficient alternative to this code? 有什么有效的替代方法吗? I'm not asking for code of course, just "concepts." 我当然不要求代码,只是“概念”。 Would there potentially be a recursive way to do this? 可能会有递归的方式做到这一点吗? Thanks! 谢谢!

The array sizes are generally <= 10. 数组大小通常<= 10。

/** Iterates through a String array ARRAY to see if each element in ARRAY is
 *  distinct. Returns false if ARRAY contains duplicates. */
boolean distinctElements(String[] array) { //Efficient?
    for (int i = 0; i < array.length; i += 1) {
        for (int j = i + 1; j < array.length; j += 1) {
            if (array[i] == array[j]) {
                return false;
            }
        }
    } return true;
}

First, array[i] == array[j] tests reference equality. 首先, array[i] == array[j]测试引用是否相等。 That's not how you test String (s) for value equality. 这不是测试String是否相等的方法。 I would add each element to a Set . 我将每个元素添加到Set If any element isn't successfully added (because it's a duplicate), Set.add(E) returns false . 如果没有成功添加任何元素(因为它是重复元素),则Set.add(E)返回false Something like, 就像是,

static boolean distinctElements(String[] array) {
    Set<String> set = new HashSet<>();
    for (String str : array) {
        if (!set.add(str)) {
            return false;
        }
    }
    return true;
}

You could render the above without a short-circuit like 您可以在没有短路的情况下渲染以上内容

static boolean distinctElements(String[] array) {
    Set<String> set = new HashSet<>(Arrays.asList(array));
    return set.size() == array.length;
}

"Efficiency" is almost always a trade-off. “效率”几乎总是一个权衡。 Occasionally, there are algorithms that are simply better than others, but often they are only better in certain circumstances. 有时,有些算法会比其他算法好一些,但通常只有在某些情况下才更好。

For example, this code above: it's got time complexity O(n^2) . 例如,上面的代码:时间复杂度为O(n^2)

One improvement might be to sort the strings: you can then compare the strings by comparing if an element is equal to its neighbours. 一种改进可能是对字符串进行排序:然后,您可以通过比较元素是否等于其邻居来比较字符串。 The time complexity here is reduced to O(n log n) , because of the sorting, which dominates the linear comparison of elements. 由于排序,这里的时间复杂度降低为O(n log n) ,它支配了元素的线性比较。

However - what if you don't want to change the elements of the array - for instance, some other bit of your code relies on them being in their original order - now you also have to copy the array and then sort it, and then look for duplicates. 但是-如果您不想更改数组的元素-例如,您的代码的其他部分依赖于它们的原始顺序-现在,您还必须复制数组,然后对其进行排序,然后寻找重复项。 This doesn't increase the overall time or storage complexity, but it does increase the overall time and storage , since more work is being done and more memory is required. 这不会增加总时间或存储复杂性,但是会增加总时间存储 ,因为需要完成更多的工作并且需要更多的内存。

Big-oh notation only gives you a bound on the time ignoring multiplicative factors . 大数表示法只会使您无视乘法因素 ,从而束缚了时间。 Maybe you only have access to a really slow sorting algorithm: actually, it turns out to be faster just to use your O(n^2) loops, because then you don't have to invoke the very slow sort. 也许您只能访问一个非常慢的排序算法:实际上,事实证明,仅使用O(n^2)循环会更快,因为那样您就不必调用非常慢的排序了。

This could be the case when you have very small inputs. 当您的输入非常小时,可能会出现这种情况。 An oft-cited example of an algorithm that has poor time complexity but actually is useful in practice is Bubble Sort: it's O(n^2) in the worst case, but if you have a small and/or nearly-sorted array, it can actually be pretty darn fast, and pretty darn simple to implement - never forget the inefficiency of you having to write and debug the code, and to have to ask questions on SO when it doesn't work as you expect. 经常引用的算法是时间排序较差,但实际上在实践中有用的算法是冒泡排序:在最坏的情况下,它是O(n^2) ,但是如果数组很小和/或接近排序,它就可以实际上可以非常快速地实现,并且非常容易实现-永远不要忘记编写和调试代码的效率低下,以及当SO无法正常工作时不得不在SO上提问。

What if you know that the elements are already sorted, because you know something about their source. 如果您知道元素已经排序,该怎么办,因为您对它们的来源有所了解。 Now you can simply iterate through the array, comparing neighbours, and the time complexity is now O(n) . 现在,您可以简单地遍历数组,比较邻居,时间复杂度现在为O(n) I can't remember where I read it, but I once saw a blog post saying (I paraphrase): 我不记得我在哪里读的书,但是我曾经看到一篇博客文章说(我释义):

A given computer can never be made to go quicker; 一台给定的计算机永远无法更快运行。 it can only ever do less work. 它只能做更少的工作。

If you can exploit some property to do less work, that improves your efficiency. 如果您可以利用某些财产来减少工作量,那将提高效率。

So, efficiency is a subjective criterion: 因此,效率是一个主观标准:

  • Whenever you ask "is this efficient", you have to be able to answer the question: "efficient with respect to what ?". 每当您问“这是有效的”时,您都必须能够回答以下问题:“相对于什么有效 ?”。 It might be space; 可能是空间; it might be time; 可能是时候了; it might be how long it takes you to write the code. 您可能需要花费多长时间编写代码。
  • You have to know the constraints of the hardware that you're going to run it on - memory, disk, network requirements etc may influence your choices. 您必须知道要在其上运行的硬件的限制-内存,磁盘,网络要求等可能会影响您的选择。
  • You need to know the requirements of the user on whose behalf you are running it. 您需要知道您所代表的用户的需求。 One user might want the results as soon as possible; 一个用户可能希望尽快得到结果; another user might want the results tomorrow. 另一个用户明天可能会想要结果。 There is never a need to find a solution better than "good enough" (although that can be a moving goal once the user sees what is possible). 从来没有必要找到比“足够好”更好的解决方案(尽管一旦用户看到了可能,这可能是一个移动的目标)。
  • You also have to know what inputs you want it to be efficient for, and what properties of that input you can exploit to avoid unnecessary work. 您还必须知道您希望它对哪些输入有效,以及可以利用该输入的哪些属性来避免不必要的工作。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM