简体   繁体   中英

How do I make only “n” comparisons finding min and max from text file?

So I have a file that has n number of integers in it. I need to find a way to make n comparisons when finding the min and max instead of 2n comparisons. My current code makes 2n comparisons...

min=max=infile.nextInt();
while ( infile.hasNextInt() )
    {

        int placeholder = infile.nextInt(); // works as a placeholders
        if (placeholder < min)
        {
            min = placeholder;
        }
        if (placeholder > max)
        {
            max = placeholder;

NOTE: I can only change what is in the while loop. I just do not know how I would easily find the min and max using a basic for loop... Is there any simple solution to this? What am I missing?

I don't think you can do this in n comparisons. You can do it in 3n/2 - 2 comparisons as follows:

  1. Take the items in pairs and compare the items in each pair. Put the higher values from each comparison in one list, and the lower values in another. That takes n/2 comparisons.
  2. Find the maximum from the higher-values list: n/2-1 comparisons.
  3. Find the minimum from the lower-values list: n/2-1 comparisons.

I think your approach is optimal because it takes O(n) comparisons. n or 2n is not important according to Big O :

int min = Integer.MAX_VALUE;
int max = Integer.MIN_VALUE;

while (infile.hasNextInt()) {
    int val = infile.nextInt();

    if(val < min)
        min = val;
    else if(val > max)
        max = val;
}

You can do the same using additional storage, but in this case you have less comparisons but additional space:

TreeSet<Integer> unique = new TreeSet<>();

while(infile.hasNextInt())
    unique.add(infile.nextInt());

int min = unique.pollFirst();
int max = unique.pollLast();

As MadPhysicist stated: you could change the two if's into an if/else if:

min = max = infile.nextInt();
while (infile.hasNextInt()) {

    int placeholder = infile.nextInt(); // works as a placeholders
    if (placeholder <= min) {
        min = placeholder;
    } else if (placeholder > max) {
        max = placeholder;
    }
}

In the best case (a strictly decreasing sequence, where each value is smaller than the previous one) you only need n-1 comparisons.

In the worst case (a strictly increasing sequence, where each value is larger than the previous one) you will still need 2*(n-1) comparisons. You cannot completely eliminate this case: if a value is larger than the current minimum it could be a new maximum value.

The typical case (a random sequence of values) you will need something between n-1 and 2*(n-1) comparisons.

Also note that I changed the comparison for the minimum value from < to <= : if a value is equal to the minimum value it cannot be a new maximum value at the same time.

Given that you initialize the min/max to the first element, you aleady have 2(n - 1) comparisons. Furthermore, if you change the two if s into an if-else if , you will save at least one more comparison, for a total of 2n - 3 .

A generalization of @Matt Timmermans' answer is therefore possible:

Split your input into groups of size k . Find the max and min of each group using 2k - 3 comparisons. This leaves you with n/k items to check for a minimum and n/k items to check for a maximum. You have two options:

  1. Just make the comparisons, for a total of (n/k) * (2k - 3) + 2 * (n/k - 1) . This shows that Matt's answer is optimal, since the expression is smallest for k = 2 (the fraction reduces to something over k for all values of n ).
  2. Continue splitting into groups of size k (or some other size). Finding the maximum of k elements requires k-1 comparisons. So you can split your n/k minimum candidates into groups of k again to get n/k 2 candidates for an additional n/k * (k-1) comparisons. You can continue the process go get a total of (n/k) * (2k - 3) + 2 * (k - 1) * Σ n/k i . The sum evaluates to 1 / (k-1) , so the total is > 2n , even compensating the hand-waving over-approximation implicit in the sum.

The reason that approach #2 does not reduce the number of comparisons is that the most gain is to be had from splitting the list into two sets of candidates for each criterion. The remainder of the calculation is best optimized through a single pass through each list.

The moral of the story is that while you can save a couple of comparisons here and there, you probably shouldn't. You have to consider the amount of overhead you incur with setting up additional lists (or even doing in-place swapping), as well as the reduced legibility of your code, among other factors.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM