简体   繁体   中英

Java time-efficient sparse 1D array (double)

I need an efficient Java structure to manipulate very sparse vectors of doubles: basic read / write operations. I implemented it in a HashMap but the access is too slow. Should I use another data structure? Do you recommend any free library?

Looking for some peaceful advice :)

Thanks a lot,

Marie

HashMap is the way to go. It shouldn't be slow. Run your code through a profiler to see where all the time goes and then optimize accordingly. If you need tips to optimize the code, post an example here so we can help with a specific issue.

[EDIT] Depending on the size of the indexes, you can use a technique as in Integer.valueOf(int) to cache the objects for boxing. But this will only work when you create lots of maps and the indexes are in a somewhat limited range.

Or you can try IntHashMap from commons-lang . It's a bit hard to use (it's package private) but you can copy the code.

Lastly, you could use your own implementation of an int-based HashMap with optimized value lookup for your case.

How big is your dataset? Much larger than Integer.MAX_VALUE? the problem is that HashSet is backed by an array. Collisions will slow performance. Perhaps it's not the mechanism of hashmap that is too slow, but the fact that you have multiple collisions. Perhaps if you partitioned your data first (eg) using another hash function, then stored each partition of data in it's own hashmap you'd have more luck.

You can copy paste the sparse vector from my Hapax project: ch.akuhn.matrix.SparseVector

PS: to all those other answers and comments that dont grok why using a map is too slow. It is slow because a map boxes all indices to Integer objects!

The sparse vector presented here is fast for read access and appending values, but not for putting at random indices. Its is optimal for a scenario where you first create the sprase vector but putting values in order of increasing indices, and later use the map for reading mostly.

Important methods in the sparse vector class are

// ...

public class SparseVector {

    /*default*/ int[] keys;
    /*default*/ int size, used;
    /*default*/ double[] values;

    public SparseVector(int size, int capacity) {
        assert size >= 0;
        assert capacity >= 0;
        this.size = size;
        this.keys = new int[capacity];
        this.values = new double[capacity];
    }

    public double get(int key) {
        if (key < 0 || key >= size) throw new IndexOutOfBoundsException(Integer.toString(key));
        int spot = Arrays.binarySearch(keys, 0, used, key);
        return spot < 0 ? 0 : values[spot];
    }

    public boolean isUsed(int key) {
        return 0 <= Arrays.binarySearch(keys, 0, used, key);
    }

    public double put(int key, double value) {
        if (key < 0 || key >= size) throw new IndexOutOfBoundsException(Integer.toString(key));
        int spot = Arrays.binarySearch(keys, 0, used, key);
        if (spot >= 0) return values[spot] = (float) value;
        else return update(-1 - spot, key, value);
    }

    public void resizeTo(int newSize) {
        if (newSize < this.size) throw new UnsupportedOperationException();
        this.size = newSize;
    }

    public int size() {
        return size;
    }

    private double update(int spot, int key, double value) {
        // grow if reaching end of capacity
        if (used == keys.length) {
            int capacity = (keys.length * 3) / 2 + 1;
            keys = Arrays.copyOf(keys, capacity);
            values = Arrays.copyOf(values, capacity);
        }
        // shift values if not appending
        if (spot < used) {
            System.arraycopy(keys, spot, keys, spot + 1, used - spot);
            System.arraycopy(values, spot, values, spot + 1, used - spot);
        }
        used++;
        keys[spot] = key;
        return values[spot] = (float) value;
    }

    public int used() {
        return used;
    }

    public void trim() {
        keys = Arrays.copyOf(keys, used);
        values = Arrays.copyOf(values, used);
    }

}

For 1D sparse array, map is normally the way to go. You only need to use a library if it's multi-dimension.

If you compare access time between map and array,

   map.get(99);
   array[99];

map is going to be much slower. Any library would have the same issue.

Is that sparse array all about? You trade time for space.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM