在Java中高效創建組合而不會出現內存問題

Question

一些背景

我正在處理一個問題，其中我將設置存儲在哈希圖中，且鍵是設置名稱，即Set1-> a，b，c，e，g...。Set2-> a，g，h，f ... Set3-> b，c，e ...等

該程序的目的是從用戶那里獲取一個值作為“閾值”（即2），該值用於組之間的最小共性。 如果達到或超過閾值，程序將建議在兩個集合之間進行合並。

我創建了一個組合創建器，該組合創建器將在設置名稱之間生成所有可能的組合，以便與未考慮的順序進行比較，例如（Set1，Set2，），（Set1，Set3），（Set2，Set3），（Set1，Set2，Set3）。

這些組合的集合然后用於實際比較這些集合。 如果達到閾值，則將此組合存儲在單獨的列表中，以作為可能的合並輸出給用戶。 在此輸出之前，這些是刪除子組合的一些邏輯，即，如果（Set1，Set2，Set3）是可能的合並，那么您可以忽略其他三個子組合，因為此超級組合已經覆蓋了它。 然后，我們輸出建議的合並。

問題

當我們達到一定數量的要比較的集合時（例如，高於17個），我們遇到了內存不足的問題，因為創建了數百萬個組合。 我希望您能幫助您理解替代方法或我們如何改進此方法。 它可以工作，但是效率不高:(

組合創作者

/**
 * Iterates through the setsToBeCompared ArrayList and gets all the combinations
 *
 * @return - ArrayList with all the possible combinations
 */
public ArrayList<String> generateCombinations(ArrayList<String> setsToBeCompared) {
    List<List<String>> temp = new ArrayList<>();
    ArrayList<String> a = new ArrayList<>();
    for (int i = 2; i <= 3; i++) {
        temp = calculateCombinations(setsToBeCompared, i);
        for (List<String> list : temp) {
            a.add(list.toString());
        }                       
    }
    return a;
        }

/**
 * Calculates all the combination given by the parameters
 *
 * @param values - the names of the sets to be compared
 * @param size   - where to start from
 * @return - List of all possible calculated combinations
 */
private List<List<String>> calculateCombinations(List<String> values, int size) {

    if (0 == size) {
        return Collections.singletonList(Collections.<String>emptyList());
    }

    if (values.isEmpty()) {
        return Collections.emptyList();
    }

    List<List<String>> combination = new LinkedList<List<String>>();

    String actual = values.iterator().next();
    List<String> subSet = new LinkedList<String>(values);
    subSet.remove(actual);
    List<List<String>> subSetCombination = calculateCombinations(subSet, size - 1);
    for (List<String> set : subSetCombination) {
        List<String> newSet = new LinkedList<String>(set);
        newSet.add(0, actual);
        combination.add(newSet);
    }

    combination.addAll(calculateCombinations(subSet, size));

    return combination;
}

Answer 1

因此，總結一下我作為評論發表的觀點。

在您的情況下，絕對不選擇生成集合的所有子集，因為此類子集的數量約為2 ^N。 對於N = 50，它大於地球存在的時間（以納秒為單位）。

我假設要從集合的 子集切換到其項的子集 。 假設M個子集有N不同的項，合並閾值為T 所以，你需要盡量只^〜NT· K-組合大小的T尋找其子集可以通過項目的這種組合，這對於小合並T是可以接受的。

算法如下：

let D - collection of initial sets
let S - collection of distinct elements in sets across D

for each k-combination c over S {
   M = new M(c)          // merge object, which stores subset of merged sets and k-combination by which they are merged
   for each (s in D) {
      if (s.containsAll(c))
         M.sets.add(s)
   }
   if (M.sets.size > 0)  // some sets was merged through c
       merges.add(M)
}

之后，進行所有可能的合並對，刪除其他合並完全覆蓋的合並：

for each m in merges {
    for each m1 in merges {
        if (m.sets.containsAll(m1.sets))
            m1.markDeleted()
    }
}

Answer 2

這樣的事情怎么樣（將使用更少的內存，但是您仍然需要檢查大量的值-2 ^ N）

import static java.util.stream.IntStream.range;

public class Subsets implements Iterator<List<Integer>> {

    private final int level;
    private final LinkedList<List<Integer>> queue = new LinkedList<>();


    public Subsets(int level) {
        this.level = level;
        range(0, level).forEach(i -> queue.add(Arrays.asList(i)));
    }

    @Override
    public boolean hasNext() {
        return !queue.isEmpty();
    }

    public List<Integer> next() {
        List<Integer> list = queue.removeFirst();
        int maxValue = list.get(list.size() - 1);

        if(list.size() < level) {

            for (int k = maxValue+1; k < level; k++) {
                List<Integer> newList = new ArrayList<>(list);
                newList.add(k);
                queue.addFirst(newList);
            }
        }
        return list;
    }

    public static void main(String[] args) {
        Subsets s4 = new Subsets(4);
        while (s4.hasNext()) {
            System.err.println(s4.next());

        }
    }
}

要使用此功能，您需要將集合（鍵）的名稱映射為整數。 樣本輸出：

[0]
[0, 3]
[0, 2]
[0, 2, 3]
[0, 1]
[0, 1, 3]
[0, 1, 2]
[0, 1, 2, 3]
[1]
[1, 3]
[1, 2]
[1, 2, 3]
[2]
[2, 3]
[3]

在Java中高效創建組合而不會出現內存問題

問題描述

2 個解決方案

解決方案1
0 2016-03-04 13:43:16

解決方案2
0 2016-03-04 14:42:20

在Java中高效創建組合而不會出現內存問題

問題描述

2 個解決方案

解決方案1 0 2016-03-04 13:43:16

解決方案2 0 2016-03-04 14:42:20

解決方案1
0 2016-03-04 13:43:16

解決方案2
0 2016-03-04 14:42:20