More efficient alternative to these “for” loops?

Question

I'm taking an introductory course to Java and one of my latest projects involve making sure an array doesn't contain any duplicate elements (has distinct elements). I used a for loop with an inner for loop, and it works, but I've heard that you should try to avoid using many iterations in a program (and other methods in my classes have a fair number of iterations as well). Is there any efficient alternative to this code? I'm not asking for code of course, just "concepts." Would there potentially be a recursive way to do this? Thanks!

The array sizes are generally <= 10.

/** Iterates through a String array ARRAY to see if each element in ARRAY is
 *  distinct. Returns false if ARRAY contains duplicates. */
boolean distinctElements(String[] array) { //Efficient?
    for (int i = 0; i < array.length; i += 1) {
        for (int j = i + 1; j < array.length; j += 1) {
            if (array[i] == array[j]) {
                return false;
            }
        }
    } return true;
}

Answer 1

First, array[i] == array[j] tests reference equality. That's not how you test String (s) for value equality. I would add each element to a Set . If any element isn't successfully added (because it's a duplicate), Set.add(E) returns false . Something like,

static boolean distinctElements(String[] array) {
    Set<String> set = new HashSet<>();
    for (String str : array) {
        if (!set.add(str)) {
            return false;
        }
    }
    return true;
}

You could render the above without a short-circuit like

static boolean distinctElements(String[] array) {
    Set<String> set = new HashSet<>(Arrays.asList(array));
    return set.size() == array.length;
}

Answer 2

"Efficiency" is almost always a trade-off. Occasionally, there are algorithms that are simply better than others, but often they are only better in certain circumstances.

For example, this code above: it's got time complexity O(n^2) .

One improvement might be to sort the strings: you can then compare the strings by comparing if an element is equal to its neighbours. The time complexity here is reduced to O(n log n) , because of the sorting, which dominates the linear comparison of elements.

However - what if you don't want to change the elements of the array - for instance, some other bit of your code relies on them being in their original order - now you also have to copy the array and then sort it, and then look for duplicates. This doesn't increase the overall time or storage complexity, but it does increase the overall time and storage , since more work is being done and more memory is required.

Big-oh notation only gives you a bound on the time ignoring multiplicative factors . Maybe you only have access to a really slow sorting algorithm: actually, it turns out to be faster just to use your O(n^2) loops, because then you don't have to invoke the very slow sort.

This could be the case when you have very small inputs. An oft-cited example of an algorithm that has poor time complexity but actually is useful in practice is Bubble Sort: it's O(n^2) in the worst case, but if you have a small and/or nearly-sorted array, it can actually be pretty darn fast, and pretty darn simple to implement - never forget the inefficiency of you having to write and debug the code, and to have to ask questions on SO when it doesn't work as you expect.

What if you know that the elements are already sorted, because you know something about their source. Now you can simply iterate through the array, comparing neighbours, and the time complexity is now O(n) . I can't remember where I read it, but I once saw a blog post saying (I paraphrase):

A given computer can never be made to go quicker; it can only ever do less work.

If you can exploit some property to do less work, that improves your efficiency.

So, efficiency is a subjective criterion:

Whenever you ask "is this efficient", you have to be able to answer the question: "efficient with respect to what ?". It might be space; it might be time; it might be how long it takes you to write the code.
You have to know the constraints of the hardware that you're going to run it on - memory, disk, network requirements etc may influence your choices.
You need to know the requirements of the user on whose behalf you are running it. One user might want the results as soon as possible; another user might want the results tomorrow. There is never a need to find a solution better than "good enough" (although that can be a moving goal once the user sees what is possible).
You also have to know what inputs you want it to be efficient for, and what properties of that input you can exploit to avoid unnecessary work.

More efficient alternative to these “for” loops?

Question

2 answers

solution1
1 ACCPTED 2015-10-19 21:37:18

solution2
1 2015-10-19 21:48:48

More efficient alternative to these “for” loops?

Question

2 answers

solution1 1 ACCPTED 2015-10-19 21:37:18

solution2 1 2015-10-19 21:48:48

solution1
1 ACCPTED 2015-10-19 21:37:18

solution2
1 2015-10-19 21:48:48