简体   繁体   English

在 Java 中“分组依据”和聚合值的最佳数据结构?

[英]Best data structure to “group by” and aggregate values in Java?

I created an ArrayList of Array type like below,我创建了一个如下所示的 Array 类型的 ArrayList,

ArrayList<Object[]> csvArray = new ArrayList<Object[]>();

As you can see, each element of the ArrayList is an array like {Country, City, Name, Age}.如您所见,ArrayList 的每个元素都是一个数组,如 {Country, City, Name, Age}。

Now I'm wanting to do a " group by " on Country and City (combined), followed by taking the average Age of the people for each Country+City.现在我想对国家城市(组合)进行“分组”,然后取每个国家+城市的平均年龄

May I know what is the easiest way to achieve this?我可以知道实现这一目标的最简单方法是什么吗? Or you guys have suggestions to use data structures better than ArrayList for this "group by" and aggregation requirements?或者你们有建议使用比 ArrayList 更好的数据结构来满足这个“分组依据”和聚合要求?

Your answers are much appreciated.非常感谢您的回答。

You will get lot of options in Java 8.您将在 Java 8 中获得很多选项。

Example例子

 Stream<Person> people = Stream.of(new Person("Paul", 24), new Person("Mark",30), new Person("Will", 28));
 Map<Integer, List<String>> peopleByAge = people
.collect(groupingBy(p -> p.age, mapping((Person p) -> p.name, toList())));
 System.out.println(peopleByAge);

If you can use Java 8 and no specific reason for using a data structure, you can go through below tutorial如果你可以使用 Java 8 并且没有使用数据结构的具体原因,你可以通过下面的教程

http://java.dzone.com/articles/java-8-group-collections http://java.dzone.com/articles/java-8-group-collections

You could use Java 8 streams for this and Collectors.groupingBy .您可以为此使用 Java 8 流和Collectors.groupingBy For example:例如:

final List<Object[]> data = new ArrayList<>();
data.add(new Object[]{"NL", "Rotterdam", "Kees", 38});
data.add(new Object[]{"NL", "Rotterdam", "Peter", 54});
data.add(new Object[]{"NL", "Amsterdam", "Suzanne", 51});
data.add(new Object[]{"NL", "Rotterdam", "Tom", 17});

final Map<String, List<Object[]>> map = data.stream().collect(
        Collectors.groupingBy(row -> row[0].toString() + ":" + row[1].toString()));

for (final Map.Entry<String, List<Object[]>> entry : map.entrySet()) {
    final double average = entry.getValue().stream()
                                .mapToInt(row -> (int) row[3]).average().getAsDouble();
    System.out.println("Average age for " + entry.getKey() + " is " + average);
}

You can check the collections recommended by @duffy356.您可以查看@duffy356 推荐的系列。 I can give you an standard solution related with java.utils我可以给你一个与java.utils相关的标准解决方案

I'd use a common Map<Key,Value> and being specific a HashMap .我会使用一个通用的Map<Key,Value>并且是一个特定的HashMap
For the keys, as I can see, you'll need and extra plain object which relates country and city.对于键,如我所见,您需要一个与国家和城市相关的额外普通对象。 The point is create a working equals(Object) : boolean method.关键是创建一个工作equals(Object) : boolean方法。 I'd use the Eclipse-auto generator;我会使用 Eclipse 自动生成器; for me it gives me the following:对我来说,它给了我以下内容:

class CountryCityKey {
 // package visibility
 String country;
 String city;

@Override
public int hashCode() {
  final int prime = 31;
  int result = 1;
  result = prime * result + ((country == null) ? 0 : country.hashCode());
  result = prime * result + ((region == null) ? 0 : region.hashCode());
  return result;
}

@Override
public boolean equals(Object obj) {
  if (this == obj)
    return true;
  if (obj == null)
    return false;
  if (getClass() != obj.getClass())
    return false;
  CountryCityKey other = (CountryCityKey) obj;
  if (country == null) {
    if (other.country != null)
      return false;
  } else if (!country.equals(other.country))
    return false;
  if (region == null) {
    if (other.region != null)
      return false;
  } else if (!region.equals(other.region))
    return false;
  return true;
}

} }


Now we can group or objects in a HashMap<CountryCityKey, MySuperObject>现在我们可以在HashMap<CountryCityKey, MySuperObject>分组或对象

The code for that could be:代码可能是:

Map<CountryCityKey, List<MySuperObject>> group(List<MySu0perObject> list) {
  Map<CountryCityKey, MySuperObject> response = new HashMap<>(list.size());  
  for (MySuperObject o : list) {
     CountryCityKey key = o.getKey(); // I consider this done, so simply
     List<MySuperObject> l;
     if (response.containsKey(key)) {
        l = response.get(key);
     } else {
        l = new ArrayList<MySuperObject>();
     }
     l.add(o);
     response.put(key, l);
  }
  return response;
}

And you have it :)你有它:)

you could use the brownies-collections library of magicwerk.org ( http://www.magicwerk.org/page-collections-overview.html )你可以使用magicwerk.org的brownies-collections库( http://www.magicwerk.org/page-collections-overview.html

they offer keylists, which fit your requirements.( http://www.magicwerk.org/page-collections-examples.html )他们提供符合您要求的密钥列表。( http://www.magicwerk.org/page-collections-examples.html

I would recommend an additional step.我会推荐一个额外的步骤。 You gather your data from CSV in Object[].您从 Object[] 中的 CSV 收集数据。 If you wrap your data into a class containing these data java8 collections will easily help you.如果您将数据包装到一个包含这些数据的类中,java8 集合将很容易为您提供帮助。 (also without but it is more readable and understandable) (也没有,但它更具可读性和可理解性)

Here is an example - it introduces a class Information which contains your given data (country, city,name, age).这是一个示例 - 它引入了一个类Information ,其中包含您的给定数据(国家、城市、姓名、年龄)。 The class has a constructor initializing these fields by a given Object[] array which might help you to do so - BUT: the fields have to be fixed (which is usual for CSV):该类有一个构造函数,通过给定的Object[]数组初始化这些字段,这可能会帮助您这样做 - 但是:必须修复这些字段(这对于 CSV 来说很常见):

import java.util.ArrayList;
import java.util.List;
import java.util.Map;
import java.util.stream.Collectors;

public class CSVExample {

  public static void main(String[] args) {
    ArrayList<Information> csvArray = new ArrayList<>();

    csvArray.add(new Information(new Object[] {"France", "Paris", "Pierre", 34}));
    csvArray.add(new Information(new Object[] {"France", "Paris", "Madeleine", 26}));
    csvArray.add(new Information(new Object[] {"France", "Toulouse", "Sam", 34}));
    csvArray.add(new Information(new Object[] {"Italy", "Rom", "Paul", 44}));

// combining country and city with whitespace delimiter to use it as the map key
    Map<String, List<Information>> collect = csvArray.stream().collect(Collectors.groupingBy(s -> (s.getCountry() + " " + s.getCity())));
//for each key (country and city) print the key and the average age
    collect.forEach((k, v) -> System.out.println(k + " " + v.stream().collect(Collectors.averagingInt(Information::getAge))));
  }
}

class Information {
  private String country;
  private String city;
  private String name;
  private int age;

  public Information(Object[] information) {
    this.country = (String) information[0];
    this.city = (String) information[1];
    this.name = (String) information[2];
    this.age = (Integer) information[3];

  }

  public Information(String country, String city, String name, int age) {
    super();
    this.country = country;
    this.city = city;
    this.name = name;
    this.age = age;
  }

  public String getCountry() {
    return country;
  }

  public String getCity() {
    return city;
  }

  public String getName() {
    return name;
  }

  public int getAge() {
    return age;
  }

  @Override
  public String toString() {
    return "Information [country=" + country + ", city=" + city + ", name=" + name + ", age=" + age + "]";
  }

}

The main shows a simple output for your question. main 为您的问题显示了一个简单的输出。

In java 8 the idea of grouping objects in a collection based on the values of one or more of their properties is simplified by using a Collector.在 java 8 中,通过使用收集器简化了基于一个或多个属性值对集合中的对象进行分组的想法。

First, I suggest you add a new class as follow首先,我建议你添加一个新类如下

class Info {

    private String country;
    private String city;
    private String name;
    private int age;

    public Info(String country,String city,String name,int age){
        this.country=country;
        this.city=city;
        this.name=name;
        this.age=age;
    }

    public String toString() {
         return "("+country+","+city+","+name+","+age+")";
    }

   // getters and setters       

}

Setting up infos设置infos

   ArrayList<Info> infos  =new  ArrayList();


   infos.add(new Info("USA", "Florida", "John", 26));
   infos.add(new Info("USA", "Florida", "James", 18));
   infos.add(new Info("USA", "California", "Alan", 30));

Group by Country+City:按国家+城市分组:

  Map<String, Map<String, List<Info>>> 
           groupByCountryAndCity = infos.
             stream().
               collect(
                    Collectors.
                        groupingBy(
                            Info::getCountry,
                            Collectors.
                                groupingBy(
                                     Info::getCity     
                                          )
                                   )
                     );


    System.out.println(groupByCountryAndCity.get("USA").get("California"));

Output输出

[(USA,California,James,18), (USA,California,Alan,30)]

The average Age of the people for each Country+City:每个国家+城市的人口平均年龄:

    Map<String, Map<String, Double>> 
    averageAgeByCountryAndCity = infos.
         stream().
           collect(
             Collectors.
                 groupingBy(
                    Info::getCountry,
                     Collectors.
                         groupingBy(
                             Info::getCity,
                             Collectors.averagingDouble(Info::getAge)
                                   )
                            )
              );

     System.out.println(averageAgeByCountryAndCity.get("USA").get("Florida"));

Output:输出:

22.0
/* category , list of cars*/

Please use the below code : I have pasted it from my sample app !Happy Coding .请使用以下代码:我从我的示例应用程序中粘贴了它!Happy Coding。

                            Map<String, List<JmCarDistance>> map = new HashMap<String, List<JmCarDistance>>();

                            for (JmCarDistance jmCarDistance : carDistanceArrayList) {
                                String key  = jmCarDistance.cartype;
                                if(map.containsKey(key)){
                                    List<JmCarDistance> list = map.get(key);
                                    list.add(jmCarDistance);

                                }else{
                                    List<JmCarDistance> list = new ArrayList<JmCarDistance>();
                                    list.add(jmCarDistance);
                                    map.put(key, list);
                                }

                            }

Best data structure is a Map<Tuple, List>.最好的数据结构是 Map<Tuple, List>。

Tuple is the key, ie your group by columns.元组是关键,即按列分组。 List is used to store the row data. List 用于存储行数据。

Once you have your data in this structure, you can iterate through each key, and perform the aggregation on the subset of data.在此结构中拥有数据后,您可以遍历每个键,并对数据子集执行聚合。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM