在Java中存储大量配置

Question

I have a datatype (let's call it data) that contains 2 pieces of information: 我有一个包含2条信息的数据类型（我们称之为数据）：

int config
byte weight

This datatype is the conversion of a series of 32 booleans. 此数据类型是一系列32个布尔值的转换。 I have to perform changes to these 32 booleans convert it back to this data type and store it. 我必须对这32个布尔值进行更改，然后将其转换回此数据类型并存储。 The problem is I want to only store unique entries eliminating any duplicates. 问题是我只想存储唯一的条目，消除任何重复。 The problem is there exists 2^33 possible configurations for this data type. 问题是此数据类型存在2 ^ 33种可能的配置。

I have tried something like this: 我已经尝试过这样的事情：

static class searchedconfigs {
    Data[] searchedconfigs;
    int position;
    public searchedconfigs() {
        searchedconfigs = new Data[150000];
    }
    public void initiateposition() {
        position = 0;
    }
    public boolean searchfield(Data Key, int entries) {
        boolean exists = false;
        for (int i = 0; i <= entries; i++) {
            if (searchedconfigs[i] == Key) {
                System.out.println("break");
                exists = true;
                break;
            }
        }
        return exists;
    }
    public void add(Data config, int position) {
        searchedconfigs[position] = config;
    }
    public int getPosition() {
        return position;
    }
    public void storePosition() {
        position++;
    }
}

The position initiation is done and increase is done so each time I search the array only in the occupied positions. 位置初始化完成并且增加完成，因此每次我仅在占用位置搜索数组时。 My problem is as you can see the array is only of size 1500000. Which I need to be much bigger. 我的问题是，您可以看到数组的大小仅为1500000。我需要更大一些。 However even assigning an int of max size (I need a long to make an array of the size I actually need) causes an out of memory error. 但是，即使分配一个最大大小为int的整数（我需要很长的时间才能构成一个我实际需要的大小的数组）也会导致内存不足错误。 Furthermore my searchfield function seems to not correctly compare the key and config stored at this position. 此外，我的搜索字段功能似乎无法正确比较存储在此位置的键和配置。

Can anyone tell me what I can do to fix these mistakes or suggest a different approach to store this data. 谁能告诉我该如何解决这些错误，或者建议使用其他方法来存储这些数据。

Answer 1

Use a HashSet , and implement equals and hashCode in Data , like so: 使用HashSet ，并在Data实现equals和hashCode ，如下所示：

import java.util.Objects;

class Data {
    int config;
    byte weight;

    @Override
    public int hashCode() {
        return Objects.hash(config, weight);
    }

    @Override
    public boolean equals(Object other) {
        if (other == null) return false;
        if (!(other instanceof Data)) return false;
        if (other == this) return true;

        return this.config == other.config && this.weight == other.weight;
    }
}

Set s of any kind do not contain any duplicate elements. 任何类型的Set都不包含任何重复的元素。 Since your Data class appears to be a value type (ie the member values are more important than its identity when comparing for equality), failing to implement these two methods will still leave duplicates in your data structure of choice. 由于您的Data类似乎是一个值类型（即，在进行相等性比较时，成员值比其标识更重要），因此，如果无法实现这两种方法，仍然会在您选择的数据结构中留下重复项。

Answer 2

What is the space limitation you're actually running into? 您实际遇到的空间限制是什么？ Arrays in java are limited to Integer.MAX_VALUE (2^31-1 ?). Java中的数组限于Integer.MAX_VALUE（2 ^ 31-1？）。 Are you overrunning: 你超车了吗？

Maximum number of elements in an array? 数组中元素的最大数量？
The heap allocated to the JVM? 堆分配给JVM了吗？
The available RAM + swap space on the machine? 机器上可用的RAM +交换空间？

If it's the number of elements, then look at an alternative data structure (see below). 如果这是元素的数量，请查看备用数据结构（请参见下文）。 If you're overrunning the heap, then you should allocate more memory to your application (-Xmx arg to the JVM when running your program). 如果要覆盖堆，则应为应用程序分配更多内存（运行程序时，将-Xmx arg分配给JVM）。 If you're actually running out of memory on the box space saving tricks will only get you so far; 如果您实际用完了存储空间，节省空间的窍门只会使您大步向前。 eventually data growth will surpass those things. 最终数据增长将超过那些东西。 At that point you need to look at either horizontal scaling (distributed computing) or vertical scaling (getting a bigger box with more RAM). 此时，您需要查看水平缩放（分布式计算）或垂直缩放（获得具有更多RAM的更大包装盒）。

If you're simply overrunning an Array because it can't be sized beyond max int and space is really a concern I'd avoid using HashSet as it will take more space than either a straight List/Array or an alternate Set implementation like a TreeSet. 如果您只是因为数组大小不能超过max int而使数组超支，并且确实需要考虑空间问题，那么我将避免使用HashSet，因为它比直接的List / Array或替代Set实现（例如， TreeSet中。

For HashSets to work efficiently they need an oversized hashtable to reduce the number of hash collisions in the space. 为了使HashSet有效地工作，它们需要一个超大的哈希表以减少空间中的哈希冲突次数。 HashSet in Java has a default load factor of 75%, which means when it gets over that capacity it will resize itself larger to stay under the load factor. Java中的HashSet的默认加载因子为75％，这意味着当它超过该容量时，它将调整自身大小以保持在加载因子之下。 In general you're trading a larger amount of space for faster insertion/removal/lookup time for elements in the set which I believe is a constant time (Big O of 1). 通常，您需要为更大的空间进行交易，以便更快地插入/删除/查找集合中的元素，我认为这是恒定时间（Big O为1）。

A TreeSet should only require your storage capacity to be the same as the number of elements (negligible overhead) but at the trade off of an increased search & insertion time which is logarithmic (Big O of Log(n)). TreeSet只要求您的存储容量与元素数相同（开销可以忽略不计），但是要以增加的搜索和插入时间为对数（Log（n）的Big O）作为代价。 A List shares a similar storage characteristic (depends on the implementation used) but has a search time of N if it is unordered. 列表具有类似的存储特性（取决于所使用的实现），但是如果无序，则搜索时间为N。 (You can look up the various insertion/deletion/search times of different list implementations & ordered vs. unordered they are very well documented) （您可以很好地记录不同列表实现的各种插入/删除/搜索时间以及有序与无序的关系）

I just want to note when using a HashSet you're trading space efficiency for faster look-up time (Big O of 1). 我只想说明一下，在使用HashSet时，您在牺牲空间效率来缩短查找时间（Big O of 1）。 You have to allocate space for the hashtable which has to be bigger than the total number of elements in your collection. 您必须为哈希表分配空间，该空间必须大于集合中元素的总数。 (Of course there is the caveat that you can force the size of your bucket to basically be 1 by having a horrid hashing function which would effectively put you right back at the performance characteristics of an un-ordered list ;) （当然，需要注意的是，您可以通过使用可怕的散列函数将存储桶的大小基本设置为1，这可以有效地使您回到无序列表的性能特征上；）

在Java中存储大量配置

问题描述

2 个解决方案

解决方案1
0 2016-04-03 00:00:27

解决方案2
0 2016-04-03 01:31:08

在Java中存储大量配置

问题描述

2 个解决方案

解决方案1 0 2016-04-03 00:00:27

解决方案2 0 2016-04-03 01:31:08

解决方案1
0 2016-04-03 00:00:27

解决方案2
0 2016-04-03 01:31:08