简体   繁体   English

在过滤后的 HashMap 上运行 For-each 循环

[英]Run a For-each loop on a Filtered HashMap

I am so new to java.我对Java很陌生。 and there is my problem.这是我的问题。

I have a Map in Type of Map<Integer , List<MyObject>> that I call it myMap .我有一个 Map Map<Integer , List<MyObject>>类型的Map<Integer , List<MyObject>> ,我称之为myMap

As myMap has a lot of members (About 100000) , I don't think the for loop to be such a good idea so I wanna filter my Map<Integer , List<MyObject>> Where the bellow condition happens:由于myMap有很多成员(大约 100000),我认为for循环不是一个好主意,所以我想filter我的Map<Integer , List<MyObject>>发生以下情况的地方:

myMap.get(i).get(every_one_of_them).a_special_attribute_of_my_MyObject == null ; myMap.get(i).get(every_one_of_them).a_special_attribute_of_my_MyObject == null ;

in which every_one_of_them means i wanna to delete members of myMap which the Whole list's members(All of its Objects) are null in that attribute(for more comfort , let's call it myAttribute ).其中every_one_of_them意味着我想删除myMap成员,其中整个列表的成员(所有对象)在该属性中为空(为了更方便,我们称之为myAttribute )。

one of my uncompleted idea was such a thing:我的一个未完成的想法是这样的:

Map<Integer, List<toHandle>> collect = myMap.entrySet().stream()
.filter(x -> x.getValue.HERE_IS_WHERE_I_DO_NOT_KNOW_HOW_TO)
.collect(Collectors.toMap(x -> x.getKey(), x -> x.getValue()));

Any Help Will Be Highly Appreciated.任何帮助都将受到高度赞赏。 Thanks.谢谢。

You can你可以

  • iterate over map values() and remove from it elements which you don't want.迭代 map values()并从中删除您不想要的元素。 You can use for that removeIf(Predicate condition) .您可以使用该removeIf(Predicate condition)
  • To check if all elements in list fulfill some condition you can use list.stream().allMatch(Predicate condition)要检查列表中的所有元素是否满足某些条件,您可以使用list.stream().allMatch(Predicate condition)

For instance lets we have Map<Integer, List<String>> and we want to remove lists which have all strings starting with b or B .例如,让我们有Map<Integer, List<String>>并且我们想要删除所有字符串以bB开头的列表。 You can do it via你可以通过

myMap.values()
     .removeIf(list -> list.stream()
                           .allMatch(str -> str.toLowerCase().startsWith("b"))
// but in real application for better performance use 
//                         .allMatch(str -> str.regionMatches(true, 0, "b", 0, 1))

     );

DEMO:演示:

Map<Integer , List<String>> myMap = new HashMap<>(Map.of(
        1, List.of("Abc", "Ab"),
        2, List.of("Bb", "Bc"),
        3, List.of("Cc")
));

myMap.values()
     .removeIf(list -> list.stream()
                           .allMatch(str -> str.toLowerCase().startsWith("b"))
     );
System.out.println(myMap);

Output:输出:

{1=[Abc, Ab], 3=[Cc]}

As myMap has a lot of members (About 100000) , I don't think the for loop to be such a good idea so I wanna filter由于 myMap 有很多成员(大约 100000),我认为 for 循环不是一个好主意,所以我想过滤

That sounds like you think stream.filter is somehow faster than foreach.这听起来像你认为 stream.filter 比 foreach 更快。 It's not;它不是; it's either slower or about as fast.它要么更慢,要么差不多快。

SPOILER: All the way at the end I do some basic performance tests, but I invite anyone to take that test and upgrade it to a full JMH test suite and run it on a variety of hardware.剧透:最后我做了一些基本的性能测试,但我邀请任何人参加这个测试并将它升级到一个完整的 JMH 测试套件,并在各种硬件上运行它。 However - it says you're in fact exactly wrong, and foreach is considerably faster than anything involving streams.但是 - 它说你实际上完全错了,而且 foreach 比任何涉及流的东西都要快得多。

Also, it sounds like you feel 100000 is a lot of entries.此外,听起来您觉得 100000 是很多条目。 It mostly isn't.大多数情况下不是。 a foreach loop (or rather, an iterator) will be faster. foreach 循环(或者更确切地说,迭代器)会更快。 Removing with the iterator will be considerably faster.使用迭代器删除会快得多。

parallelism can help you out here, and is simpler with streams, but you can't just slap a parallel() in there and trust that it'll just work out.并行性可以在这里帮助你,并且使用流更简单,但你不能只是在那里打一个parallel()并相信它会成功。 It depends on the underlying types.这取决于底层类型。 For example, your plain jane juHashMap isn't very good at this;例如,你的普通 jane juHashMap 不是很擅长这个; Something like a ConcurrentHashMap is far more capable.像 ConcurrentHashMap 这样的东西更有能力。 But if you take the time to copy over all data to a more suitable map type, well, in that timespan you could have done the entire job, and probably faster to boot!但是,如果您花时间将所有数据复制到更合适的地图类型,那么在那个时间跨度内您就可以完成整个工作,而且启动速度可能会更快! (Depends on how large those lists are). (取决于这些列表有多大)。

Step 1: Make an oracle第 1 步:制作预言机

But, first things first, we need an oracle function: One that determines if a given entry ought to be deleted.但是,首先,我们需要一个 oracle 函数:一个确定是否应该删除给定条目的函数。 No matter what solution you go with, this is required:无论您采用何种解决方案,这都是必需的:

public boolean keep(List<MyObject> mo) {
    for (MyObject obj : mo) if (obj.specialProperty != null) return true;
    return false;
}

you could 'streamify' it:你可以“流化”它:

public boolean keep(List<MyObject> mo) {
    return mo.stream().anyMatch(o -> o.specialProperty != null);
}

Step 2: Filter the list第 2 步:过滤列表

Once we have that, the task becomes easier:一旦我们有了它,任务就变得容易了:

var it = map.values().iterator();
while (it.hasNext()) if (!keep(it.next())) it.remove();

is now all you need.现在就是你所需要的。 We can streamify that if you prefer, but note that you can't use streams to change a map 'in place', and copying over is usually considerably slower, so, this is likely slower and certainly takes more memory:如果您愿意,我们可以对其进行流化,但请注意,您不能使用流来“就地”更改地图,并且复制通常要慢得多,因此,这可能会更慢并且肯定会占用更多内存:

Map<Integer, List<MyObject>> result =
    map.entrySet().stream()
    .filter(e -> keep(e.getValue()))
    .collect(Collectors.toMap(e -> e.getKey(), e -> e.getValue()));

Note also how the stream option doesn't generally result in significantly shorter code either.还要注意流选项通常不会导致代码显着缩短。 Don't make the decision between stream or non-stream based on notions that streams are inherently better, or lead to more readable code.不要基于流本质上更好的概念在流或非流之间做出决定,或者导致更易读的代码。 Programming just isn't that simple, I'm afraid.编程并没有那么简单,恐怕。

We can also use some of the more functional methods in map itself:我们还可以在 map 本身中使用一些更实用的方法:

map.values().removeIf(v -> !keep(v));

That seems like the clear winner, here, although it's a bit bizarre we have to 'bounce' through values() ;在这里,这似乎是明显的赢家,尽管我们必须通过values()来“反弹”有点奇怪; map itself has no removeIf method, but the collections returned by keySet, values, entrySet etc reflect any changes back to the map, so that works out. map 本身没有removeIf方法,但是由 keySet、values、entrySet 等返回的集合将任何更改反映回 map,这样就可以了。

Let's performance test!来进行性能测试吧!

Performance testing is tricky and really requires using JMH for good results.性能测试很棘手,确实需要使用 JMH 才能获得良好的结果。 By all means, as an exercise, do just that.无论如何,作为一种练习,就这样做吧。 But, let's just do a real quick scan:但是,让我们做一个真正的快速扫描:

import java.util.*;
import java.util.stream.*;

public class Test {
    static class MyObj {
        String foo;
    }

    public static MyObj hit() {
        MyObj o = new MyObj();
        o.foo = "";
        return o;
    }

    public static MyObj miss() {
        return new MyObj();
    }

    private static final int MAP_ELEMS = 100000;
    private static final int LIST_ELEMS = 50;
    private static final double HIT_OR_MISS = 0.01;
    private static final Random rnd = new Random();

    public static void main(String[] args) {
        var map = construct();
        long now = System.currentTimeMillis();
        filter_seq(map);
        long delta = System.currentTimeMillis() - now;
        System.out.printf("Sequential: %.3f\n", 0.001 * delta);
        map = construct();
        now = System.currentTimeMillis();
        filter_stream(map);
        delta = System.currentTimeMillis() - now;
        System.out.printf("Stream: %.3f\n", 0.001 * delta);
        map = construct();
        now = System.currentTimeMillis();
        filter_removeIf(map);
        delta = System.currentTimeMillis() - now;
        System.out.printf("RemoveIf: %.3f\n", 0.001 * delta);
    }

    private static Map<Integer, List<MyObj>> construct() {
        var m = new HashMap<Integer, List<MyObj>>();
        for (int i = 0; i < MAP_ELEMS; i++) {
            var list = new ArrayList<MyObj>();
            for (int j = 0; j < LIST_ELEMS; j++) {
                list.add(rnd.nextDouble() < HIT_OR_MISS ? hit() : miss());
            }
            m.put(i, list);
        }
        return m;
    }

    static boolean keep_seq(List<MyObj> list) {
        for (MyObj o : list) if (o.foo != null) return true;
        return false;
    }

    static boolean keep_stream(List<MyObj> list) {
        return list.stream().anyMatch(o -> o.foo != null);
    }

    static void filter_seq(Map<Integer, List<MyObj>> map) {
        var it = map.values().iterator();
        while (it.hasNext()) if (!keep_seq(it.next())) it.remove();
    }

    static void filter_stream(Map<Integer, List<MyObj>> map) {
        Map<Integer, List<MyObj>> result =
            map.entrySet().stream()
            .filter(e -> keep_stream(e.getValue()))
            .collect(Collectors.toMap(e -> e.getKey(), e -> e.getValue()));
    }

    static void filter_removeIf(Map<Integer, List<MyObj>> map) {
        map.values().removeIf(v -> !keep_stream(v));
    }
}

This, reliably, on my hardware anyway, shows that the stream route is by far the slowest, and the sequential option wins out with some percent from the removeIf variant.无论如何,这在我的硬件上可靠地表明流路由是迄今为止最慢的,并且顺序选项从 removeIf 变体中胜出。 Which just goes to show that your initial line (if I can take that as 'I think foreach is too slow') was entirely off the mark, fortunately.幸运的是,这只是表明您的初始行(如果我可以将其视为“我认为 foreach 太慢”)完全不合时宜。

For fun I replaced the map with a ConcurrentHashMap and made the stream parallel() .为了好玩,我用ConcurrentHashMap替换了地图并使流parallel() This did not change the timing significantly, and I wasn't really expecting it too.这并没有显着改变时间,我也没有真正期待它。

A note about style关于风格的说明

In various snippets, I omit braces for loops and if statements.在各种片段中,我省略了循环和 if 语句的大括号。 If you add them, the non-stream-based code occupies considerably more lines, and if you include the indent whitespace for the insides of these constructs, considerably more 'surface area' of paste.如果添加它们,则非基于流的代码占用的行数要多得多,如果在这些结构内部包含缩进空格,则粘贴的“表面积”要大得多。 However, that is a ridiculous thing to clue off of - that is tantamount to saying: "Actually, the commonly followed style guides for java are incredibly obtuse and badly considered. However, I dare not break them. Fortunately, lambdas came along and gave me an excuse to toss the entire principle of those style guides right out the window and now pile it all into a single, braceless line, and oh look, lambdas lead to shorter code!".然而,这是一件很荒谬的事情——这相当于说:“实际上,通常遵循的 Java 风格指南非常迟钝且考虑不周。但是,我不敢打破它们。幸运的是,lambdas 出现并给出了我找个借口把那些风格指南的整个原则扔到窗外,现在把它们全部堆成一条没有括号的线,哦,看,lambdas 导致代码更短!”。 I would assume any reader, armed with this knowledge, can easily pierce through such baloney argumentation.我认为任何拥有这些知识的读者都可以轻松地洞悉这种胡说八道的论点。 The reasons for those braces primarily involve easier debug breakpointing and easy ways to add additional actions to a given 'code node', and those needs are exactly as important, if not more so, if using streams.使用这些大括号的原因主要涉及更容易的调试断点和向给定的“代码节点”添加附加操作的简单方法,如果使用流,这些需求即使不是更重要,也同样重要。 If it's okay to one-liner and go brace-free for lambdas, then surely it is okay to do the same to if and for bodies.如果它的好一个班轮去撑,免费为lambda表达式,那么这无疑是好的做同样的iffor机构。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM