简体   繁体   中英

How to store multiple json objects in java Concurrent Java List while Spark Task Executors do work

I am trying to populate certain List type of data structures via multiple Spark Task executors. So, I am looking for atomicity.

So, I have say 10 rows . Each row has say m key value pairs . key1-val1, ....keym-valm.

Now My Task executors are trying to ingest these rows in a database like dynamodb. My db ingestor has OnSuccess OnFailure handlers written. I want to know can I ensure I have a "concurrent" List with 10 items where each item points to one row ie each row has m key value pairs.

Which data structure to use. Since this is invoked by task executor I thought of using LinkedBlockingQueue. But what would be the exact Collection.

Does this BlockingQueue look OK ? But how would each element in blocking queue contain a list of key value pairs ?

If you are looking to accumulate the result of a task in Spark, you should use the accumulator framework of spark. You read about the framework here:- https://spark.apache.org/docs/2.2.0/rdd-programming-guide.html#accumulators .

In the case of plane java concurrency, if you just want to store the value from different threads, then instead of using a blocking queue, you can simply use ConcurrentHashMap. where the key would be your number 1 to 10 and the value can be of type ConcurrentLinkedQueue, which can contain the key-value pair.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM