简体   繁体   English

MarkLogic Optic API:按订单执行

[英]MarkLogic Optic API: order-by performance

We have two types of documents: books and book sections. 我们有两种类型的文档:书籍和书籍部分。 We use TDE to define views for the two types. 我们使用TDE为这两种类型定义视图。

Schema (relevant part): 架构(相关部分):

  • view books: id, title 查看书籍:ID,标题
  • view booksections: id, bookid 查看书目:id,bookid

The use case is to list the books with more than 5000 sections. 用例是列出包含5000多个部分的书籍。 For each book, title and number of sections should be returned. 对于每本书,应返回书名和章节数。 Using the Optic API, the query with group-by is like this: 使用Optic API,group-by的查询如下所示:

op:from-view("myschema", "books") => 
    op:join-inner(op:from-view("myschema", "booksections"), op:on(
        op:view-col("books", "id"),
        op:view-col("booksections",   "bookid"))) => 
    op:group-by(
        (op:view-col("books", "title")), 
        (op:count("count", op:view-col("booksections", "id")))) => 
    op:where(op:ge(op:col("count"), 5000)) => 
    op:select((op:view-col("books", "title"), "count")) => 
    op:order-by(op:desc("count")) => 
    op:result()

The query returns a small result set: 4 books. 该查询返回一个小的结果集:4本书。 Now the interesting thing is that this query needs 5 seconds to complete, and if I remove the op:order-by statement, only 3 seconds. 现在有趣的是,此查询需要5秒钟才能完成,如果删除op:order-by语句,则只需3秒钟。 Somehow 2 seconds are spent to order the 4 books in the result. 不知何故花了2秒时间订购了4本书。

Is there anything I can do to speed up the ordering (except for doing the ordering as a post-processing step)? 我可以做些什么来加快订购速度(除了将订购作为后处理步骤外)?

The times were measured with warm triple caches. 时间是用温暖的三重高速缓存测量的。 op:explain shows the order-by operation as the outer-most operation, indicating that the ordering is applied to the small set of 4 books. op:explain将order-by操作显示为最外层的操作,指示该订购适用于4本书的小集。 Using SQL has resulted in the same run times and the same acceleration without order-by. 使用SQL可以实现相同的运行时间和相同的加速,而无需按顺序进行。

Definitely upgrading to new MarkLogic version might help solve the problem. 绝对升级到新的MarkLogic版本可能有助于解决该问题。 Despite that, spending two seconds to sort 4 rows of result is definitely not convincing. 尽管如此,花费两秒钟对4行结果进行排序绝对不能令人信服。 There is much more happening during the query execution when you add the op:order-by() clause and that should explain the increase in time. 添加op:order-by()子句在查询执行期间发生了更多事情,这应该可以解释时间的增加。

To understand better on what actually happens during the execution of two queries, we should take a look at the query plans(using op:explain()) returned by the server. 为了更好地了解在执行两个查询期间实际发生的情况,我们应该看一下服务器返回的查询计划(使用op:explain())。 Based on the statistics on the underlying data and order-by() clause added to the query, the query optimizer might choose a different query plan. 基于对基础数据和添加到查询中的order-by()子句的统计信息,查询优化器可能会选择其他查询计划。 Sharing the query plans for the two queries will help us lead to the right direction to help the optimizer choose the better query plan. 共享两个查询的查询计划将帮助我们找到正确的方向,以帮助优化器选择更好的查询计划。

You will probably be better off contacting MarkLogic Support with your test case. 与您的测试用例联系MarkLogic支持可能会更好。 I'd say that Ramesh is right, and that the query optimizer is picking a sub-optimal query plan for your query. 我想说Ramesh是正确的,并且查询优化器正在为您的查询选择次优的查询计划。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM