简体   繁体   中英

How to parallelize database queries in Spring Flux?

I want to expose aggregated results from a mysql database with a Flux<JSONObject> stream in Spring.

@RestController
public class FluxController {
     @GetMapping("/", produces = TEXT_EVENT_STREAM_VALUE)
     public Flux<JSONObject> stream() {
          return service.getJson();
     }
}

@Service
public class DatabaseService {
    public List<JSONObject> getJson() {
        List<Long> refs = jdbc.queryForList(...);
        MapSqlParameterSource params = new MapSqlParameterSource();
        params.addValue("refs", refs);

        //of course real world sql is much more complex
        List<Long, Product> products = jdbc.query(SELECT * from products where ref IN (:refs), params);
        List<Long, Item> items = jdbc.query(SELECT * from items where ref IN (:refs), params);
        List<Long, Warehouse> warehouses = jdbc.query(SELECT * from warehouses where ref IN (:refs), params);

        List<JSONObject> results = new ArrayList<>();
        for (Long ref : refs) {
            JSONObject json = new JSONObject();
            json.put("ref", ref);
            json.put("product", products.get(ref));
            json.put("item", items.get(ref));
            json.put("warehouse", warehouses.get(ref));
            results.add(json);
        }

        return results;
    }

Now I want to convert this to a flux, to expose it as an event stream. But how can I parallelize the db lookup and chain it together to a flux?

    public Flux<JSONObject> getJsonFlux() {
        //I need this as source
        List<Long> refs = jdbc.queryForList(...);

        return Flux.fromIterable(refs).map(refs -> {
            //TODO how to aggregate the different database calls concurrently?
            //and then expose each JSONObject one by one into the stream as soon as it is build?
        };
    }

Sidenote: I know this will still be blocking. But in my real application, I'm applying pagination and chunking, so each chunk will get exposed to the stream when ready.

Then main problem is that I don't know how to parallelize, and then aggregate/merge the results eg in the last flux step.

If I understand well you would like to execute queries by passing all refs as parameter.

It will not really be an event stream this way, since it will wait until all queries are finished and all json objects are in memory and just start streaming them after that.

public Flux<JSONObject> getJsonFlux()
{
    return Mono.fromCallable(jdbc::queryForList)
               .subscribeOn(Schedulers.elastic()) // elastic thread pool meant for blocking IO, you can use a custom one
               .flatMap(this::queryEntities)
               .map(this::createJsonObjects)
               .flatMapMany(Flux::fromIterable);
}

private Mono<Tuple4<List<Long>, List<Product>, List<Item>, List<Warehouse>>> queryEntities(List<Long> refs)
{
    Mono<List<Product>> products = Mono.fromCallable(() -> jdbc.queryProducts(refs)).subscribeOn(Schedulers.elastic());
    Mono<List<Item>> items = Mono.fromCallable(() -> jdbc.queryItems(refs)).subscribeOn(Schedulers.elastic());
    Mono<List<Warehouse>> warehouses = Mono.fromCallable(() -> jdbc.queryWarehouses(refs)).subscribeOn(Schedulers.elastic());

    return Mono.zip(Mono.just(refs), products, items, warehouses); // query calls will be concurrent
}

private List<JSONObject> createJsonObjects(Tuple4<List<Long>, List<Product>, List<Item>, List<Warehouse>> tuple)
{
    List<Long> refs = tuple.getT1();
    List<Product> products = tuple.getT2();
    List<Item> items = tuple.getT3();
    List<Warehouse> warehouses = tuple.getT4();

    List<JSONObject> jsonObjects = new ArrayList<>();

    for (Long ref : refs)
    {
        JSONObject json = new JSONObject();
        // build json object here

        jsonObjects.add(json);
    }

    return jsonObjects;
}

The alternative way is to query entities for each ref separately. This way each JSONObject is queried individually and they can interleave in the stream. I'm not sure how the database handles that kind of load. That's something you should consider.

public Flux<JSONObject> getJsonFlux()
{
    return Mono.fromCallable(jdbc::queryForList)
               .flatMapMany(Flux::fromIterable)
               .subscribeOn(Schedulers.elastic()) // elastic thread pool meant for blocking IO, you can use a custom one
               .flatMap(this::queryEntities)
               .map(this::createJsonObject);
}

private Mono<Tuple4<Long, Product, Item, Warehouse>> queryEntities(Long ref)
{
    Mono<Product> product = Mono.fromCallable(() -> jdbc.queryProduct(ref)).subscribeOn(Schedulers.elastic());
    Mono<Item> item = Mono.fromCallable(() -> jdbc.queryItem(ref)).subscribeOn(Schedulers.elastic());
    Mono<Warehouse> warehouse = Mono.fromCallable(() -> jdbc.queryWarehouse(ref))
                                     .subscribeOn(Schedulers.elastic());

    return Mono.zip(Mono.just(ref), product, item, warehouse); // query calls will be concurrent
}

private JSONObject createJsonObject(Tuple4<Long, Product, Item, Warehouse> tuple)
{
    Long ref = tuple.getT1();
    Product product = tuple.getT2();
    Item item = tuple.getT3();
    Warehouse warehouse = tuple.getT4();

    JSONObject json = new JSONObject();
    // build json object here

    return json;
}

The idea is to firstly fetch complete list of refs , and then simultaneously fetch Products, Items, and Warehouses - I called this Tuple3 lookups . Then combine each ref with lookups and convert it to JSONObject one by one.

return Mono.fromCallable(jdbc::queryForList) //fetches refs
                .subscribeOn(Schedulers.elastic())
                .flatMapMany(refList -> { //flatMapMany allows to convert Mono to Flux in flatMap operation
                            Flux<Tuple3<Map<Long, Product>, Map<Long, Item>, Map<Long, Warehouse>>> lookups = Mono.zip(fetchProducts(refList), fetchItems(refList), fetchWarehouses(refList))
                                    .cache().repeat(); //notice cache - it makes sure that Mono.zip is executed only once, not for each zipWith call

                            return Flux.fromIterable(refList)
                                    .zipWith(lookups);
                        }
                )
                .map(t -> {
                    Long ref = t.getT1();
                    Tuple3<Map<Long, Product>, Map<Long, Item>, Map<Long, Warehouse>> lookups = t.getT2();
                    JSONObject json = new JSONObject();
                    json.put("ref", ref);
                    json.put("product", lookups.getT1().get(ref));
                    json.put("item", lookups.getT2().get(ref));
                    json.put("warehouse", lookups.getT3().get(ref));
                    return json;
                });

Methods for each database call:

Mono<Map<Long, Product>> fetchProducts(List<Long> refs) {
    return Mono.fromCallable(() -> jdbc.query(SELECT * from products where ref IN(:refs),params))
        .subscribeOn(Schedulers.elastic());
}

Mono<Map<Long, Item>> fetchItems(List<Long> refs) {
    return Mono.fromCallable(() -> jdbc.query(SELECT * from items where ref IN(:refs),params))
        .subscribeOn(Schedulers.elastic());
}

Mono<Map<Long, Warehouse>> fetchWarehouses(List<Long> refs) {
    return Mono.fromCallable(() -> jdbc.query(SELECT * from warehouses where ref IN(:refs),params))
        .subscribeOn(Schedulers.elastic());
}

Why do I need subsribeOn?

I put it because of 2 reasons:

  1. It allows to execute database query on the thread from dedicated thread pool, which prevents blocking main thread: https://projectreactor.io/docs/core/release/reference/#faq.wrap-blocking

  2. It allows to truly parallelize Mono.zip . See this one, it's regarding flatMap , but it's also applicable to zip : When FlatMap will listen to multiple sources concurrently?


For completeness, the same is possible when using .flatMap() on the zip result. Though I'm not sure if .cache() is still necessary here.

   .flatMapMany(refList -> {
        Mono.zip(fetchProducts(refList), fetchItems(refList), fetchWarehouses(refList)).cache()
            .flatMap(tuple -> Flux.fromIterable(refList).map(refId -> Tuples.of(refId, tuple)));
    .map(tuple -> {
        String refId = tuple.getT1();
        Tuple lookups = tuple.getT2();
    }
})

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM