简体   繁体   中英

Sequential processing of multi-threaded results

I am setting up a Spring Boot application ( DAO pattern with @Repositories ) where I am attempting to write a @Service to asynchronously pull data from a database in multiple threads and merge-process the incoming payloads sequentially, preferably on arrival.

The goal is to utilize parallel database access for requests where multiple non-overlapping sets of filter conditions need to be queried individually, but post-processed (transformed, eg aggregated) into a combined result.

Being rather new to Java, and coming from Golang and its comparably trivial syntax for multi-threading and task-communication, I struggle to identify a preferable API in Java and Spring Boot - or determine if this approach is even favorable to begin with.


Question:

Given

  • a Controller :

     @RestController @RequestMapping("/api") public class MyController { private final MyService myService; @Autowired public MyController(MyService myService) { this.myService = myService; } @PostMapping("/processing") public DeferredResult<MyResult> myHandler(@RequestBody MyRequest myRequest) { DeferredResult<MyResult> myDeferredResult = new DeferredResult<>(); myService.myProcessing(myRequest, myDeferredResult); return myDeferredResult; }
  • a Service :

     import com.acme.parallel.util.MyDataTransformer @Service public class MyServiceImpl implementing MyService { private final MyRepository myRepository; @Autowired public MyService(MyRepository myRepository) { this.myRepository = myRepository; } public void myProcessing(MyRequest myRequest, MyDeferredResult myDeferredResult) { MyDataTransformer myDataTransformer = new MyDataTransformer(); /* PLACEHOLDER CODE for (MyFilter myFilter : myRequest.getMyFilterList()) { // MyPartialResult myPartialResult = myRepository.myAsyncQuery(myFilter); // myDataTransformer.transformMyPartialResult(myPartialResult); } */ myDeferredResult.setResult(myDataTransformer.getMyResult()); } }
  • a Repository :

     @Repository public class MyRepository { public MyPartialResult myAsyncQuery(MyFilter myFilter) { // for the sake of an example return new MyPartialResult(myFilter, TakesSomeAmountOfTimeToQUery.TRUE); } }
  • as well as a MyDataTransformer helper class:

     public class MyDataTransformer { private final MyResult myResult = new MyResult(); // eg a Map public void transformMyPartialResult(MyPartialResult myPartialResult) { /* PLACEHOLDER CODE this.myResult.transformAndMergeIntoMe(myPartialResult); */ } }

how can I implement

  • the MyService.myProcessing method asynchronously and multi-threaded, and

  • the MyDataTransformer.transformMyPartialResult method sequential/thread-safe

  • (or redesign the above)

most performantly, to merge incoming MyPartialResult into one single MyResult ?


Attempts:

The easiest solution seems to be to skip the "on arrival" part, and a commonly preferred implementation might eg be:

public void myProcessing(MyRequest myRequest, MyDeferredResult myDeferredResult) {
  MyDataTransformer myDataTransformer = new MyDataTransformer();
  
  List<CompletableFuture<myPartialResult>> myPartialResultFutures = new ArrayList<>();

  for (MyFilter myFilter : myRequest.getMyFilterList()) {     // Stream is the way they say, but I like for
    myPartialResultFutures.add(CompletableFuture.supplyAsync(() -> myRepository.myAsyncQuery(myFilter));
  }

  myPartialResultFutures.stream()
    .map(CompletableFuture::join)
    .map(myDataTransformer::transformMyPartialResult);
      
  myDeferredResult.setResult(myDataTransformer.getMyResult());
}

However, if feasible I'd like to benefit from sequentially processing incoming payloads when they arrive , so I am currently experimenting with something like this:

public void myProcessing(MyRequest myRequest, MyDeferredResult myDeferredResult) {
  MyDataTransformer myDataTransformer = new MyDataTransformer();
  
  List<CompletableFuture<myPartialResult>> myPartialResultFutures = new ArrayList<>();

  for (MyFilter myFilter : myRequest.getMyFilterList()) {
    myPartialResultFutures.add(CompletableFuture.supplyAsync(() -> myRepository.myAsyncQuery(myFilter).thenAccept(myDataTransformer::transformMyPartialResult));
  }

  myPartialResultFutures.forEach(CompletableFuture::join);
      
  myDeferredResult.setResult(myDataTransformer.getMyResult());
}

but I don't understand if I need to implement any thread-safety protocols when calling myDataTransformer.transformMyPartialResult , and how - or if this even makes sense, performance-wise.


Update:

Based on the assumption that

  • myRepository.myAsyncQuery takes slightly varying amounts of time, and
  • myDataTransformer.transformMyPartialResult taking an ever increasing amount of time each call

implementing a thread-safe/atomic type/Object , eg a ConcurrentHashMap :

public class MyDataTransformer {

  private final ConcurrentMap<K, V> myResult = new ConcurrentHashMap<K, V>();

  public void transformMyPartialResult(MyPartialResult myPartialResult) {
    myPartialResult.myRows.stream()
      .map((row) -> this.myResult.merge(row[0], row[1], Integer::sum)));
  }
}

into the latter Attempt (processing "on arrival" ):

public void myProcessing(MyRequest myRequest, MyDeferredResult myDeferredResult) {
  MyDataTransformer myDataTransformer = new MyDataTransformer();
  
  List<CompletableFuture<myPartialResult>> myPartialResultFutures = new ArrayList<>();

  for (MyFilter myFilter : myRequest.getMyFilterList()) {
    myPartialResultFutures.add(CompletableFuture.supplyAsync(() -> myRepository.myAsyncQuery(myFilter).thenAccept(myDataTransformer::transformMyPartialResult));
  }

  myPartialResultFutures.forEach(CompletableFuture::join);
      
  myDeferredResult.setResult(myDataTransformer.getMyResult());
}

is up to one order of magnitude faster than waiting on all threads first, even with atomicity protocol overhead.


Now, this may have been obvious (not ultimately, though, as async/multi-threaded processing is by far not always the better choice), and I am glad this approach is a valid choice.

What remains is what looks to me like a hacky, flexibility lacking solution - or at least an ugly one. Is there a better approach?

try asynch annotation , resolve dependencies using autowiring. For threadsafe code use SYNCHRONIZE block/method or use modern java technique like lock

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM