简体   繁体   中英

Java fastest way to get matching range

I have a set of integer ranges, which represent lower and upper bounds of classes. For example:

0..500 xsmall
500..1000 small
1000..1500 medium
1500..2500 large

In my case there can be over 500 classes. These classes do not overlap, but they can differ in size.

I can implement finding the matching range as a simple linear search through a list, for example

class Range
{
  int lower;
  int upper;
  String category;

  boolean contains(int val)
  {
    return lower <= val && val < upper;
  }
}

public String getMatchingCategory(int val)
{
   for (Range r : listOfRanges)
   {
      if (r.contains(val))
      {
         return r.category;
      }
   }
   return null;
}

However, this seems slow; as I need on average N/2 look-ups. If the classes were equally sized, I could use division. Is there a standard technique to find the correct range faster?

What you are looking for is a SortedMap and its methods tailMap and firstKey . Check out the documentation for full details.

The advantage of this approach over plain arrays is in the ease of maintaining your ranges: you can insert/remove new boundaries at any point with almost no runtime cost; with arrays it means copying both parallel arrays in full.

Update

I've written code for both variants and benchmarked it:

@State(Scope.Thread)
@OutputTimeUnit(TimeUnit.MICROSECONDS)
public class BinarySearch
{
  static final int ARRAY_SIZE = 128, INCREMENT = 1000;
  static final int[] arrayK = new int[ARRAY_SIZE];
  static final String[] arrayV = new String[ARRAY_SIZE];
  static final SortedMap<Integer,String> map = new TreeMap<>();
  static {
    for (int i = 0, j = 0; i < arrayK.length; i++) {
      arrayK[i] = j; arrayV[i] = String.valueOf(j);
      map.put(j, String.valueOf(j));
      j += INCREMENT;
    }
  }
  final Random rnd = new Random();
  int rndInt;

  @Setup(Level.Invocation) public void nextInt() { 
    rndInt = rnd.nextInt((ARRAY_SIZE-1)*INCREMENT); 
  }

  @GenerateMicroBenchmark
  public String array() {
    final int i = Arrays.binarySearch(arrayK, rndInt);
    return arrayV[i >= 0? i : -(i+1)];
  }

  @GenerateMicroBenchmark
  public String sortedMap() {
    return map.tailMap(rndInt).values().iterator().next();
  }
}

Benchmark results:

Benchmark     Mode Thr    Cnt  Sec         Mean   Mean error    Units
array        thrpt   1      5    5       10.948        0.033 ops/usec
sortedMap    thrpt   1      5    5        5.752        0.070 ops/usec

Interpretation: array search is only twice as fast and this factor is quite stable across array sizes. In the presented code the array size is 1024 and the factor is 1.9. I've also tested with array size 128, where the factor is 2.05.

Here, Arrays.binarySearch is your friend. Simply put all the boundaries in and handle the possible cases. Assuming you ranges leave no holes between them, you only need to put the upper bounds in.

For you example

0..500 xsmall
500..1000 small
1000..1500 medium
1500..2500 large

you'd use

int[] boundaries = {500, 1000, 1500, 2500};

and look up the input. Handle the two cases (found/not found) and you're done. Forget about ranges, they're nice but they don't fit you problem.

Update

I also wrote a benchmark and no matter how I try I'd lose my bet as the ratio is about 3 rather than 5. The strange things like S001024 in my results stand for the size 1024.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM