简体   繁体   中英

Is it faster to use Predicates to filter a Concurrent Map or a List, using parallelStream?

I have multiple FileMap objects stored in a List<FileMap> , with currently about 500,000 objects.

I am using Predicates to filter the List using parallelStream. I am now reading the documentation and see there is a function called Collectors.toConcurrentMap() . I am familiar with ConcurrentHashMap and knows it is faster because multiple threads divide the map.

Will changing the simple ArrayList to toConcurrentMap and then using Predicates with parallelStream work faster ? Currently If I am using parallelStream on that List and using serialStream it works the same speed.

Map is a collection of key-value pairs, where keys are unique. Data you have is not a map, but a list. There are a lot of problems:

  1. Trying to transform list into a map will require to provide key and value mapping functions.
  2. You will end up with bigger structure than you had originally.
  3. You will have to ensure that key mapping function returns unique values hence making parallelization impossible (you can use synchronization but it will greatly decrease performance).
  4. A map is more complex structure than a list (which is effectively an array) and constructing it takes much more time.
  5. ConcurrentMap has extra complexity to ensure thread safety - although it is done in smarter ways than just making all methods synchronized it still affects performance.
  6. Iterating over the map has not much to do with how the data is stored - you will need to get a values set anyway.

Filtering the elements of the list can be heavily (and easily) parallelized. Having n cores, where n is a length of the list, you can achieve performance as good as log(n) - this is of course using specialized parallel algorithms and using graphics cards instead of CPU, as these although less powerful, have thousands of cores.

I have run a few tests on a list with 100 million integers and processing it sequentially took about 700ms, using parallel stream - about 350ms (I guess Java used only 2 threads), while trying to convert a list into ConcurrentMap has thrown out of memory error after a few minutes.

You have mentioned that using stream() and parallelStream() didn't change the performance. I would recommend investigating how does Java chooses how many threads to use in parallel stream (and how to change). This is also affected by your resources - running more CPU consuming threads than the number of cores in your CPU will decrease performance due to context switching. I would advise to use only as many threads as the number of cores you have or one fewer - so that one core can be used for all other OS work.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM