简体   繁体   English

代码优化以将整数列表转换为 Java 中的对象列表

[英]Code optimization to covert list of integers to list of objects in Java

I have around 50000 to 500000 employee Ids and I want those employee ids to convert into detail objects.我有大约 50000 到 500000 个员工 ID,我希望这些员工 ID 转换为详细信息对象。

I have done something like this to achieve that:我做了这样的事情来实现这一目标:

private Set<Detail> setDetail(List<Integer> employees, Group group) {
  Set<Detail> details = employees.stream().parallel().map(id -> new Detail(id, group)).collect(Collectors.toSet());
  return details ;
}

But this is very slow and getting more slower with increase number of employee ids.但这非常缓慢,并且随着员工 ID 数量的增加而变得越来越慢。 How I can optimize this code?我如何优化此代码? What are the optimization techniques/algorithms I can use to better optimize this.我可以使用哪些优化技术/算法来更好地优化它。

You should try to avoid creating that many objects.您应该尽量避免创建那么多对象。 No matter which algorithm you'll pick if your DB will keep growing at some point you won't fit it all in memory.无论您选择哪种算法,如果您的数据库在某个时候继续增长,您将无法将其全部放入内存中。 Also the bottleneck is likely going to be getting data from DB (not creating objects).此外,瓶颈很可能是从数据库获取数据(而不是创建对象)。

So try to re-architect your app so that the data is pre-calculated and stored in DB at the time of performing the operation in question.因此,请尝试重新构建您的应用程序,以便在执行相关操作时预先计算数据并将其存储在数据库中。

If after careful consideration you decide that you do need to work with that many objects a better option would be to keep working with primitives:如果经过仔细考虑,您决定确实需要使用这么多对象,那么更好的选择是继续使用原语:

class EmployeesInGroup {
   private final int[] ids;
   private final Group group;
   ...

   Detail get(int idx) {
      return new Details(ids[idx], group);
   }

   int size() {
     return ids.length;
   }
}

Then you can iterate over this list and work with 1 object at a time w/o keeping a lot of them in memory:然后,您可以遍历此列表并一次处理 1 个对象,而无需在内存中保留很多对象:

EmployeesInGroup list = new EmployeesInGroup(ids, group);
for(int i = 0; i < list.size(); i++) {
  Detail d = list.get(i);
  ...
}

You can make it implement Iterable and use for-each loop.您可以使其实现Iterable并使用 for-each 循环。

Benchmarks基准

Approach that I listed above is at least 20x faster than creating an array of Detail objects.我上面列出的方法比创建 Detail 对象数组至少快 20 倍。 Working with streams and lists slows it all down even more.使用流和列表会进一步减慢速度。 I didn't check with Integer but I'd predict it will slow everything down by another factor of 2 or something.我没有检查Integer但我预测它会将所有内容减慢 2 倍左右。

Benchmark                                        Mode  Cnt     Score    Error  Units
EmployeeConversionBenchmark.objectArray         thrpt   20   368.702 ±  3.483  ops/s
EmployeeConversionBenchmark.primitiveArray      thrpt   20  7595.080 ± 68.841  ops/s
EmployeeConversionBenchmark.streamsWithObjects  thrpt   20   197.923 ±  1.616  ops/s

Here's the code that I used:这是我使用的代码:

import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.Fork;
import org.openjdk.jmh.annotations.Scope;
import org.openjdk.jmh.annotations.State;

import java.util.Arrays;
import java.util.List;
import java.util.Random;

import static java.util.stream.Collectors.toList;

public class EmployeeConversionBenchmark {
    public static void main(String[] args) throws Exception {
        org.openjdk.jmh.Main.main(new String[]{EmployeeConversionBenchmark.class.getSimpleName()});
    }

    @Benchmark @Fork(value = 1, warmups = 0)
    public int primitiveArray(Data data) {
        EmployeesInGroup e = new EmployeesInGroup(data.ids, data.group);
        int sum = 0;
        for (int i = 0; i < e.size(); i++)
            sum += e.get(i).getId();
        return sum;
    }
    @Benchmark @Fork(value = 1, warmups = 0)
    public int objectArray(Data data) {
        EmployeesInGroup.Detail[] e = new EmployeesInGroup.Detail[data.ids.length];
        for (int i = 0; i < data.ids.length; i++)
            e[i] = new EmployeesInGroup.Detail(data.ids[i], data.group);

        int sum = 0;
        for (EmployeesInGroup.Detail detail : e)
            sum += detail.getId();
        return sum;
    }
    @Benchmark @Fork(value = 1, warmups = 0)
    public int streamsWithObjects(Data data) {
        List<EmployeesInGroup.Detail> e = Arrays.stream(data.ids).mapToObj(id -> new EmployeesInGroup.Detail(id, data.group)).collect(toList());
        int sum = 0;
        for (EmployeesInGroup.Detail detail : e)
            sum += detail.getId();
        return sum;
    }

    @State(Scope.Benchmark)
    public static class Data {
        private final int[] ids = new int[500_000];
        private final EmployeesInGroup.Group group = new EmployeesInGroup.Group();
        public Data() {
            for (int i = 0; i < ids.length; i++)
                ids[i] = new Random().nextInt();
        }
    }

    public static class EmployeesInGroup {
        private final int[] ids;
        private final Group group;

        public EmployeesInGroup(int[] ids, Group group) {
            this.ids = ids;
            this.group = group;
        }
        public Detail get(int idx) {
            return new Detail(ids[idx], group);
        }

        public int size() {
            return ids.length;
        }

        public static class Group {
        }

        public static class Detail {
            private final int id;
            private final Group group;

            public Detail(int id, Group group) {
                this.id = id;
                this.group = group;
            }
            public int getId() {
                return id;
            }
        }
    }
}

In general, a conversation about performance and optimization should include numbers, eg: the latency you are trying to achieve, the latency you actually observe, some details about your hardware, etc...一般来说,关于性能和优化的对话应该包括数字,例如:您试图实现的延迟、您实际观察到的延迟、有关您的硬件的一些详细信息等......

Your problem (if there is one) is not in the method you posted.您的问题(如果有)不在您发布的方法中。 On my laptop, the code below executes in ~250 ms.在我的笔记本电脑上,下面的代码在大约 250 毫秒内执行。 and even faster (~ 200 ms) if you remove parallel() :如果删除parallel() ,甚至更快(~ 200 ms parallel()

class Detail {
    private Integer id;
    private String group;

    public Detail(Integer id, String group) {
        this.id = id;
        this.group = group;
    }
}

public class Main {
    public Set<Detail> setDetail(List<Integer> employees, String group) {
        return employees.stream().parallel().map(id -> new Detail(id, group)).collect(toSet());
    }

    public static void main(String[] args) {
        List<Integer> idsList = new Random().ints().boxed().limit(500_000).collect(toList());

        long start = System.currentTimeMillis();

        Set<Detail> details = new Main().setDetail(idsList, "group");

        long duration = (System.currentTimeMillis() - start);
        System.out.println("Done in " + duration + " ms. Size was " + details.size());
    }
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM