为什么多次调用 DB

Question

I am playing with R2DBC using Postgre SQL.我正在使用 Postgre SQL 玩 R2DBC。 The usecase i am trying is to get the Film by ID along with Language, Actors and Category.我正在尝试的用例是通过 ID 以及语言、演员和类别获取电影。 Below is the schema下面是架构

this is the corresponding piece of code in ServiceImpl这是ServiceImpl中对应的一段代码

@Override
public Mono<FilmModel> getById(Long id) { 
    Mono<Film> filmMono = filmRepository.findById(id).switchIfEmpty(Mono.error(DataFormatException::new)).subscribeOn(Schedulers.boundedElastic());
    Flux<Actor> actorFlux = filmMono.flatMapMany(this::getByActorId).subscribeOn(Schedulers.boundedElastic());
    Mono<String> language = filmMono.flatMap(film -> languageRepository.findById(film.getLanguageId())).map(Language::getName).subscribeOn(Schedulers.boundedElastic());
    Mono<String> category = filmMono.flatMap(film -> filmCategoryRepository
                    .findFirstByFilmId(film.getFilmId()))
            .flatMap(filmCategory -> categoryRepository.findById(filmCategory.getCategoryId()))
            .map(Category::getName).subscribeOn(Schedulers.boundedElastic());

    return Mono.zip(filmMono, actorFlux.collectList(), language, category)
            .map(tuple -> {
                FilmModel filmModel = GenericMapper.INSTANCE.filmToFilmModel(tuple.getT1());
                List<ActorModel> actors = tuple
                        .getT2()
                        .stream()
                        .map(act -> GenericMapper.INSTANCE.actorToActorModel(act))
                        .collect(Collectors.toList());
                filmModel.setActorModelList(actors);
                filmModel.setLanguage(tuple.getT3());
                filmModel.setCategory(tuple.getT4());
                return filmModel;
            });
         }

The logs show 4 calls to film日志显示 4 次拍摄电话

2021-12-16 21:21:20.026 DEBUG 32493 --- [ctor-tcp-nio-10] o.s.r2dbc.core.DefaultDatabaseClient     : Executing SQL statement [SELECT film.* FROM film WHERE film.film_id = $1 LIMIT 2]
2021-12-16 21:21:20.026 DEBUG 32493 --- [actor-tcp-nio-9] o.s.r2dbc.core.DefaultDatabaseClient     : Executing SQL statement [SELECT film.* FROM film WHERE film.film_id = $1 LIMIT 2]
2021-12-16 21:21:20.026 DEBUG 32493 --- [ctor-tcp-nio-12] o.s.r2dbc.core.DefaultDatabaseClient     : Executing SQL statement [SELECT film.* FROM film WHERE film.film_id = $1 LIMIT 2]
2021-12-16 21:21:20.026 DEBUG 32493 --- [actor-tcp-nio-7] o.s.r2dbc.core.DefaultDatabaseClient     : Executing SQL statement [SELECT film.* FROM film WHERE film.film_id = $1 LIMIT 2]
2021-12-16 21:21:20.162 DEBUG 32493 --- [actor-tcp-nio-9] o.s.r2dbc.core.DefaultDatabaseClient     : Executing SQL statement [SELECT language.* FROM language WHERE language.language_id = $1 LIMIT 2]
2021-12-16 21:21:20.188 DEBUG 32493 --- [actor-tcp-nio-7] o.s.r2dbc.core.DefaultDatabaseClient     : Executing SQL statement [SELECT film_actor.actor_id, film_actor.film_id, film_actor.last_update FROM film_actor WHERE film_actor.film_id = $1]
2021-12-16 21:21:20.188 DEBUG 32493 --- [ctor-tcp-nio-10] o.s.r2dbc.core.DefaultDatabaseClient     : Executing SQL statement [SELECT film_category.film_id, film_category.category_id, film_category.last_update FROM film_category WHERE film_category.film_id = $1 LIMIT 1]
2021-12-16 21:21:20.313 DEBUG 32493 --- [ctor-tcp-nio-10] o.s.r2dbc.core.DefaultDatabaseClient     : Executing SQL statement [SELECT category.* FROM category WHERE category.category_id = $1 LIMIT 2]
2021-12-16 21:21:20.563 DEBUG 32493 --- [actor-tcp-nio-7] o.s.r2dbc.core.DefaultDatabaseClient     : Executing SQL statement [SELECT actor.* FROM actor WHERE actor.actor_id = $1 LIMIT 2]

I am not trying to look for SQL optimizations(joins etc).I can definitely make it more performant.我不是在寻找 SQL 优化（连接等）。我绝对可以让它更高效。 But the question in point is why i do see 4 SQL queries to Film table.但问题是为什么我确实看到 4 SQL 对 Film 表的查询。 Just to add i have already fixed the code.只是补充一下，我已经修复了代码。 But not able to understand the core reason.Thanks in advance.但无法理解核心原因。提前致谢。

Answer 1

I'm not terribly familiar with your stack, so this is a high-level answer to hit on your "Why".我对你的堆栈不是很熟悉，所以这是一个高层次的回答你的“为什么”。 There WILL be a more specific answer for you, somewhere down the pipe (eg someone that can confirm whether this thread is relevant).在 pipe 的某处（例如，可以确认此线程是否相关的人），将为您提供更具体的答案。

While I'm no Spring Daisy (or Spring dev), you bind an expression to filmMono that resolves as the query select film.* from film.... .虽然我不是 Spring Daisy（或 Spring dev），但您将表达式绑定到filmMono ，解析为查询select film.* from film.... You reference that expression four times, and it's resolved four times, in separate contexts.您在不同的上下文中引用了该表达式四次，它被解析了四次。 The ordering of the statements is likely a partially-successful attempt by the lib author to lazily evaluate the expression that you bound locally, such that it's able to batch the four accidentally identical queries.语句的排序可能是 lib 作者懒惰地评估您在本地绑定的表达式的部分成功尝试，以便它能够批处理四个意外相同的查询。 You most likely resolved this by collecting into an actual container, and then mapping on that container instead of the expression bound to filmMono .您很可能通过收集到一个实际容器中来解决此问题，然后映射到该容器而不是绑定到filmMono的表达式。

In general, this situation is because the options available to library authors aren't good when the language doesn't natively support lazy evaluation.一般来说，这种情况是因为当语言本身不支持惰性求值时，库作者可用的选项并不好。 Because any operation might alter the dataset, the library author has to choose between:因为任何操作都可能改变数据集，所以库作者必须在以下选项中做出选择：

A, construct just enough scaffolding to fully record all resources needed, copy the dataset for any operations that need to mutate records in some way, and hope that they can detect any edge-cases that might leak the scaffolding when the resolved dataset was expected (getting this right is...hard). A，构建足够的脚手架以完全记录所需的所有资源，复制数据集以进行任何需要以某种方式改变记录的操作，并希望他们能够检测到任何可能在预期解决的数据集时泄漏脚手架的边缘情况（做到这一点……很难）。
B, resolve each level of mapping as a query, for each context it appears in, lest any operations mutate the dataset in ways that might surprise the integrator (eg you). B，将每个级别的映射解析为一个查询，对于它出现的每个上下文，以免任何操作以可能使集成者（例如您）感到惊讶的方式改变数据集。
C, as above, except instead of duplicating the original request, just duplicate the data...at every step. C，同上，除了不复制原始请求，只需复制数据……在每一步。 Pass-by-copy gets real painful real fast on the JVM, and languages like Clojure and Scala handle this by just making the dev be very specific about whether they want to mutate in-place, or copy then mutate.在 JVM 上，传递复制变得非常痛苦，而 Clojure 和 Scala 等语言通过让开发人员非常具体地确定他们是否想要就地变异或复制来处理这个问题。

In your case, B made the most sense to the folks that wrote that lib.在您的情况下，B 对编写该库的人来说最有意义。 In fact, they apparently got close enough to A that they were able to batch all the queries that were produced by resolving the expression bound to filmMono (which are only accidentally identical), so color me a bit impressed.事实上，他们显然与 A 足够接近，以至于他们能够批量处理通过解析绑定到 filmMono 的表达式产生的所有查询（它们只是偶然相同），所以让我印象深刻。

Many access patterns can be rewritten to optimize for the resulting queries instead.可以重写许多访问模式以优化结果查询。 Your milage may vary...wildly.您的里程数可能会有所不同……非常不同。 Getting familiar with raw SQL, or else a special-purpose language like GraphQL, can give much more consistent results than relational mappers, but I'm ever more appreciative of good IDE support, and mixing domains like that often means giving up auto-complete, context highlighting, lang-server solution-proofs and linting.熟悉原始 SQL，或者像 GraphQL 之类的专用语言，可以提供比关系映射器更一致的结果，但我更加欣赏好的 Z581D6381F3F35E4F9D77201ACF 经常意味着放弃支持和混合域，例如 7201ACF8 ，上下文突出显示，语言服务器解决方案证明和 linting。

Given that the scope of the question was "why did this happen?", even noting my lack of familiarity with your stack, the answer is "lazy evaluation in a language that doesn't natively support it is really hard."鉴于问题的 scope 是“为什么会发生这种情况？”，即使注意到我对您的堆栈不熟悉，答案是“在本机不支持它的语言中进行惰性评估真的很难。”

Answer 2

Why I do see 4 SQL queries to Film table.为什么我看到 4 SQL 查询到 Film 表。

The reason is quite simple.原因很简单。 You are subscribing to the Mono<Film> 4 times:您正在订阅Mono<Film> 4 次：

Mono<Film> filmMono = filmRepository.findById(id);

Flux<Actor> actorFlux = filmMono.flatMapMany(...); (1)
Mono<String> language = filmMono.flatMap(...); (2)
Mono<String> category = filmMono.flatMap(...); (3)
Mono.zip(filmMono, actorFlux.collectList(), language, category) (4)

Each subscription to the filmMono triggers a new query.每次订阅filmMono触发一个新查询。 Note that, you can change that by using Mono#cache operator to turn filmMono into a hot source and cache the result for all four subscribers.请注意，您可以通过使用Mono#cache运算符将filmMono转换为热源并为所有四个订阅者缓存结果来更改它。

为什么多次调用 DB

问题描述

2 个解决方案

解决方案1
0 2021-12-17 07:24:23

解决方案2
0 2021-12-18 08:43:04

为什么多次调用 DB

问题描述

2 个解决方案

解决方案1 0 2021-12-17 07:24:23

解决方案2 0 2021-12-18 08:43:04

解决方案1
0 2021-12-17 07:24:23

解决方案2
0 2021-12-18 08:43:04