简体   繁体   中英

Efficient algorithm to find number of elements less than a query

I have two unsorted arrays a and b . For every element a[i] I need to find the number of elements b[j] such that b[j] < a[i] . In addition b may contain duplicates which should not be counted. Both arrays may be very large.

I tried (for a single query x )

public static void main(String arg[]) {
    int x = 5;
    int b[] = {2, 4, 3, 4, 11, 13, 17};
    Arrays.sort(b);
    int count = 0;
    for(int i = 0; i < b.length; ++i) {
        if(b[i] < x) {
            if(i == 0)
                ++count;
            else {
                // this check avoids counting duplicates
                if(b[i - 1] != b[i])
                    ++count;
            }
        } else {
            break;
        }
    }
    System.out.println(count);
}

My problem is that this doesn't perform well enough when querying all elements of a iteratively. What can I do to speed this up?

EDIT: given the later comments, some updates which I just put right in the beginning; leaving my first text at the bottom.

So, the core aspects here are:

  1. You came here with some problem X, but further asking told us that you actually had some problem Y to solve. That is something that you should try to avoid: when coming here (or when working on problems on your own!) ... then you should be able to clearly describe the problem you have or intend to solve. I am not fingerpointing here; just expressing that you should work hard to ensure that you understand what your real problem is.
  2. This is also visible from the fact that you are asking us what to do about duplicate numbers in your data. Err, sir: it is your problem. We do not know why you want to count those numbers; we do not know where your data is coming from; and how the final solution should deal with duplicate entries. In that sense, I am just rephrasing the first paragraph: you have to clarify your requirements. We can't help with that part at all . And you see: you only mentioned duplicates in the second array. What about those in the first one?!

OK, so about your problem. Thing is: actually, that is just "work". There is no magic there. As you have two very large arrays, working on unsorted data is an absolute no-go.

So, you start by sorting both arrays.

Then you iterate over the first array and while doing that, you also look into the second array:

int indexWithinB = 0;
int counterForCurrentA = 0; // and actually ALL values from a before
for (int i=0; i<a.length; i++) {
  int currentA = a[i];     
  while (b[indexWithinB] < currentA) {
    if (indexWithinB > 0) { // check required to avoid using 0-1
      if (b[indexWithinB-1] != b[indexWithinB] { // avoid counting duplicates!
        counterForCurrentA++;
      }
    }
    indexWithinB++;
  }
  // while loop ended, this means: b[indexWithinB] == or > currentA
  // this also means: counterForCurrentA ... should have the correct value
}

The above is obviously pseudo code. It is meant to keep you going; and it might very well be, that there are subtle errors in there. For example, as Andreas pointed out: the above needs to be enhanced to check for b.length, too. But that is left as exercise to the reader.

That is what I meant with "just work": you simply have to sit down, write testcases and refine my draft algorithm until it does the job for you. If you find it too hard to program this initially, then take a piece of paper, put down two arrays with numbers ... and do that counting manually.

Final hint: I suggest to write plenty of unit tests to test your algorithm (such stuff is perfect for unit tests); and make sure that you have all your corner cases in such tests. You want to be 100% sure that your algorithm is correct before going for your 10^5 element arrays!

And here, as promised, the original answer:

Simply spoken: iterating and counting is the most efficient way to solve this problem. So in your above case, leaving out the sorting might lead to quicker overall execution time.

The logic there is really simple: in order to know the count of numbers smaller than x ... you have to look at all of them. Thus you have to iterate the complete array (when that array is not sorted).

Thus, given your initial statement, there is no other thing than: iterate and count.

Of course, if you have to this counting multiple times ... it might be worth sorting that data initially. Because then you can use binary search , and getting that count you are looking for works without iterating all data.

And: what makes you think that iterating an array with 10^5 elements is a problem? In other words: are you just worried about a potential performance problem, or do you have a real performance problem? You see, at some point you probably had to create and fill that array. That for sure took more time (and resources) than a simple for-loop to count entries. And honestly: unless we are talking some small embedded device ... 10^5 elements ... that is close to nothing , even when using slightly outdated hardware.

Finally: when you are worried about runtime , the simple answer is: slice your input data, and use 2,4, 8, ... threads to count each slice in parallel! But as said: before writing that code, I would do some profiling be sure that you really have to spent precious development time on this. Don't solve hypothetical performance problems; focus on those that really matter to you or your users!

Comapring every item in the array with x will take you O(n) time. Sorting the array will take O(n log n), and then you can use binary search, which is O(log n), and you get a total of O(n log n). So the most efficient way is also the trivial one - just loop thru the array and compare every item with x.

public static void main(String arg[] ){
    int b[]={2, 4, 3, 4, 11, 13, 17};
    int x=5;
     int count=0;
     for(int i=0;i<b.length;i++){
         if(b[i]<x){          
             count++;
         }
     }
     System.out.println(count);
}

I propose you to consider the following approach, but it works only if the b array has non-negative numbers. The algorithm works even if input arrays (both a and b ) are not sorted.

Pseudo-code

  1. Get the max element of array b .
  2. Create a new array c of size max + 1 and put 1 in the position c[b[i]] .
  3. Create a new array d of size max + 1 and populate it as follow:

    d[0]=0;
    d[i]=d[i-1] + c[i];

  4. Create a new array e of size n and populate it as follow:

    if(a[i] > max) then e[i] = last(d)
    otherwise e[i]=d[a[i]-1];

e array represents the solution: it contains in i-th position the counter of numbers of the b array lower then the i-th element of array a . This example should be more explicative than the pseudo-code:

a = [5, 1, 4, 8, 17, 12, 22]
b = [2, 4, 3, 4, 11, 13, 17]
c = [0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1]
d = [0, 0, 1, 2, 3, 3, 3, 3, 3, 3, 3, 4, 4, 5, 5, 5, 5, 6]
e = [3, 0, 2, 3, 5, 4, 6]

Complexity

Steps 1, 2 and 4 are O(n).
Step 3 is O(max(b))

if the input array b contains only "short" numbers (max(b) is in the same order of n size) the algorithm perform in O(n) . The algorithm could be optimized creating an array of size max-min+1 and consider counter 0 for all the elements of a array lower than min(b) .

A simple java implementation:

int a[] = {5, 1, 4, 8, 17, 12, 22};
int b[] = {2, 4, 3, 4, 11, 13, 17};
int max = Arrays.stream(b).max().getAsInt();
int c[] = new int[max+1];
int d[] = new int[max+1];
int e[] = new int[a.length];
for(int i=0;i<b.length;i++){
    c[b[i]]=1;
}
for(int i=1;i<c.length;i++){
    d[i] = d[i-1] + c[i];
}
for (int i = 0; i<a.length;i++){
    e[i]=(a[i]>max)?d[d.length-1]:d[a[i]-1];
}
System.out.println(Arrays.toString(a));
System.out.println(Arrays.toString(b));
System.out.println(Arrays.toString(c));
System.out.println(Arrays.toString(d));
System.out.println(Arrays.toString(e));

For larger sorted set we need to use Divide-And-Conquer principle to fasten our search.Here is my solution which has O(logn) Time complexity and O(n) space complexity.

public static void main(String arg[]) {
    int x = 5;
    int b[] = {2, 4, 3, 4, 11, 13, 17};
    int high = b.length - 1;
    int low = 0;

    while (high >= low) {
      int mid = (high + low) / 2;
        if (b[mid] < x)
          low = mid + 1;
        else
          high = mid - 1;
    }
  System.out.println(low);

}

This should be a possible solution. The"expensive" task is the sorting of the lists. The bost list must be sorted before the for loop. Make sure you use a fast mechanism to execute the sorting. As explaned a sort on an array /array list is a very expension operation especially if there are many values you have to sort.

public static void main(String[] args) throws IOException {
    // int x = 5;
    int a[] = { 1, 2, 3, 4, 5 };
    int b[] = { 2, 4, 3, 4, 11, 13, 17 };
    List<Integer> listA = new LinkedList<>();
    for (int i : a) {
        listA.add(i);
    }
    List<Integer> listB = new LinkedList<>();
    for (int i : b) {
        listB.add(i);
    }
    Collections.sort(listA);
    Collections.sort(listB);
    int smallerValues = 0;
    int lastValue = 0;
    Iterator<Integer> iterator = listB.iterator();
    int nextValue = iterator.next();
    for (Integer x : listA) {
        while (nextValue < x && iterator.hasNext()) {
            lastValue = nextValue;
            nextValue = iterator.next();
            if (nextValue > lastValue) {
                smallerValues++;
            }
        }
        System.out.println(x + " - " + smallerValues);
    }
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM