Java：如何在支持最小，最大，平均，每組最后一種聚合的列表上進行聚合

Question

我已經在MySQL本身中進行了較早的操作，因為這似乎是正確的方法，但是我必須進行一些業務邏輯計算，然后需要在結果列表中應用group by，在Java中做到這一點的任何建議都不得影響性能（看着lambdaj，似乎由於大量使用代理而放慢了速度，但尚未嘗試）。

List<Item>包含名稱，值，unixtimestamp作為屬性，並由數據庫返回。 每條記錄相隔5分鍾。

我應該能夠按動態采樣時間（例如1小時）進行分組，這意味着必須將每12條記錄分組為一條記錄，然后對每組應用最小，最大，平均數。

任何建議表示贊賞。

[更新]進行以下工作，但尚未對索引映射值上的每個列表元素進行匯總。 如您所見，我創建了一個列表映射，其中key是請求的整數表示采樣時間（這里是30是請求的采樣）。

private List<Item> performConsolidation(List<Item> items) {
        ListMultimap<Integer, Item> groupByTimestamp = ArrayListMultimap.create();
        List<Item> consolidatedItems = new ArrayList<>();
        for (Item item : items) {
            groupByTimestamp.put((int)floor(((Double.valueOf(item.getItem()[2])) / 1000) / (60 * 30)), item);
        }
        return consolidatedItems;
    }

Answer 1

這是一個建議：

public Map<Long,List<Item>> group_items(List<Item> items,long sample_period) {
  Map<Long,List<Item>> grouped_result = new HashMap<Long,List<Item>>();
  long group_key;

  for (Item item: items) {
    group_key = item.timestamp / sample_period;
    if (grouped_result.containsKey(group_key)) {  
      grouped_result.get(group_key).add(item);
    }
    else {
      grouped_result.put(group_key, new ArrayList<Item>());
      grouped_result.get(group_key).add(item);
    }
  }
  return grouped_result;
}

sample_period是分組的秒數：3600 =小時，900 = 15分鍾

映射中的鍵當然可以是很大的數字（取決於采樣周期），但是此分組將保留組的內部時間順序，即，較低的鍵是按時間順序排在最前面的那些鍵。 如果我們假設原始列表中的數據是按時間順序排序的，那么我們當然可以獲取第一個鍵的值，然后從鍵中減去該值。 這樣，我們將獲得鍵0、1等。在這種情況下，在for循環開始之前，我們需要：

int減去= items.get（0）.timestamp / sample_period; //請注意，因為兩個數字均為整數/整數，所以我們有一個整數除法

然后在for循環中：

group_key = items.timestamp / sample_period-減去;

遵循這些原則將起作用，即按照您的描述對數據集進行分組。 然后，您可以將min max avg等應用於結果列表。 但是由於這些函數當然必須再次遍歷各個組列表，因此最好將這些計算合並到此解決方案中，並讓函數返回類似Map的值，其中Aggregates是一種包含avg，min，max，然后是組中項目的列表？ 至於性能，我認為這是可以接受的。 這是一個簡單的O（N）解決方案。 編輯：

好的，只是想添加一個更完整的解決方案/建議，它也可以計算最小值，最大值和平均值：

public class Aggregate {
  public double avg;
  public double min;
  public double max;

  public List<Item> items = new ArrayList<Item>();

  public Aggregate(Item item) {
    min = item.value;
    max = item.value;
    avg = item.value;
    items.add(item);
  }

  public void addItem(Item item) {
    items.add(item);
    if (item.value < this.min) {
      this.min = item.value;
    }
    else if (item.value > this.max) {
      this.max = item.value;
    }
    this.avg = (this.avg * (this.items.size() - 1) + item.value) / this.items.size(); 
  }
}

public Map<Long,Aggregate> group_items(List<Item> items,long sample_period) {

  Map<Long,Aggregate> grouped_result = new HashMap<Long,Aggregate>();
  long group_key;

  long subtract = items.get(0).timestamp / sample_period;
  for (Item item: items) {
    group_key = items.timestamp / sample_period - subtract;
    if (grouped_result.containsKey(group_key)) {  
      grouped_result.get(group_key).addItem(item);
    }
    else {
      grouped_result.put(group_key, new Aggregate(item));
    }
  }
  return grouped_result;
}

那只是一個粗略的解決方案。 我們可能想要向聚合等添加更多屬性。

Answer 2

除了最小/最大/等的計算之外，我注意到您的performConsolidation方法看起來可以使用Multimaps.index 。 只需將其傳遞給它，然后傳遞一個計算您想要的值的Function<Item, Integer> ：

return (int) floor(((Double.valueOf(item.getItem()[2])) / 1000) / (60 * 30));

那不會節省大量代碼，但是它可能使一目了然的事情變得更容易： index(items, timeBucketer) 。

Answer 3

如果可以使用我的xpresso項目，則可以執行以下操作：

讓您的輸入列表為：

list<tuple> items = x.list(x.tuple("name1",1d,100),x.tuple("name2",3d,105),x.tuple("name1",4d,210));

您首先解壓縮元組列表以獲得列表元組：

tuple3<list<String>,list<Double>,list<Integer>> unzipped = x.unzip(items, String.class, Double.class, Integer.class);

然后，您可以匯總所需的方式：

x.print(x.tuple(x.last(unzipped.value0), x.avg(unzipped.value1), x.max(unzipped.value2)));

前面將產生：

(name1,2.67,210)

Java：如何在支持最小，最大，平均，每組最后一種聚合的列表上進行聚合

問題描述

3 個解決方案

解決方案1
1 2014-07-09 12:09:06

解決方案2
0 2014-07-16 14:38:15

解決方案3
0 2015-06-11 02:27:51

Java：如何在支持最小，最大，平均，每組最后一種聚合的列表上進行聚合

問題描述

3 個解決方案

解決方案1 1 2014-07-09 12:09:06

解決方案2 0 2014-07-16 14:38:15

解決方案3 0 2015-06-11 02:27:51

解決方案1
1 2014-07-09 12:09:06

解決方案2
0 2014-07-16 14:38:15

解決方案3
0 2015-06-11 02:27:51