Marklogic光学API

Question

I've been testing migrating one of our systems to Marklogic 9 and using the Optics API. 我一直在测试将我们的系统之一迁移到Marklogic 9并使用Optics API。

One of our functions involves grouping claims by member_id, member_name and getting the sums and counts, so I did something like this: 我们的功能之一涉及按member_id，member_name分组声明并获取总和和计数，因此我做了如下操作：

var results = op.fromView('test', 'claims')
  .groupBy(['member_id', 'member_name'], [
         op.count('num_claims', 'claim_no'),
         op.sum('total_amount', 'claim_amount')
         ])
  .orderBy(op.desc('total_amount'))
  .limit(200)
  .result()
  .toArray();

Above works fine. 以上工作正常。 The results are of the form 结果的形式为

[
  { 
    member_id: 1, 
    member_name: 'Bob', 
    num_claims: 10, 
    total_amount: 500
  }, 
  ...
]

However, we also have a field "company", where each claim is filed under a different company. 但是，我们还有一个“公司”字段，其中每个索赔都在不同的公司下提出。 Basically the relevant view columns are claim_no, member_id, member_name, company, claim_amount 基本上，相关的视图列是Claim_no，member_id，member_name，company，claim_amount

I would like to be able to show a column that list the different companies for which the member_id/member_name has filed claims, and how many claims for each company. 我希望能够显示一列，其中列出了member_id / member_name已为其提出索赔的不同公司，以及每个公司有多少索赔。

ie I want my results to be something like: 即我希望我的结果是这样的：

[
  { 
    member_id: 1, 
    member_name: 'Bob', 
    num_claims: 10, 
    total_amount: 500,
    companies: [
      {
        company: 'Ajax Co',
        num_claims: 8
      },
      {
        company: 'Side Gig',
        num_claims: 2
      }
    ]
  }, 
  ...
]

I tried something like this: 我尝试过这样的事情：

results = results.map((member, index, array) => {
  var companies = op.fromView('test', 'claims')
    .where(op.eq(op.col('member_id'), member.member_id))
    .groupBy('company', [
      op.count('num_claims', 'claim_no')      
    ])
    .result()
    .toArray();
  member.companies = companies;
  return member;
});

And the output seems correct, but it also executes quite slowly - almost a minute (total number of claim documents is around 120k) 输出似乎正确，但执行速度也很慢-将近一分钟（索赔文档总数约为120k）

In our previous ML8 implementation, we were pre-generating summary documents for each member - so retrieval was reasonably fast with the downside that whenever we got a bunch of new data, all of the summary documents had to be re-generated. 在我们以前的ML8实现中，我们正在为每个成员预先生成摘要文档-因此检索速度相当快，而且缺点是每当我们获得大量新数据时，都必须重新生成所有摘要文档。 I was hoping that ML9's optic API would make it easier to do the retrieval/grouping/aggregates on the fly so we wouldn't have to do that. 我希望ML9的光学API可以更轻松地即时进行检索/分组/聚合，因此我们不必这样做。

In theory, I could just add company to the groupBy fields, then merge the rows in the result query as needed. 从理论上讲，我可以将company添加到groupBy字段中，然后根据需要合并结果查询中的行。 But the problem with that approach is that I can't guarantee I'll get the top 200 by total amount (as was my original query) 但是这种方法的问题是我不能保证我会获得总金额前200名（就像我原来的查询一样）

So, the question is: Is there a better way of doing this with a reasonable execution time? 因此，问题是：在合理的执行时间上是否有更好的方法呢？ Or should I just stick to pre-generating the summary documents? 还是我应该坚持只生成摘要文件？

Answer 1

If I'm understanding correctly, you should be able to implement that with a single Optic query that groups twice. 如果我理解正确，那么您应该能够通过将两次光学查询分组的方式来实现这一点。

The first group should aggregate to the company level 第一组应汇总到公司级别
The second group should aggregate to the member level, collecting the detail with the array aggregate 第二组应汇总到成员级别，并使用数组汇总收集详细信息

The query would probably look something like the following: 该查询可能类似于以下内容：

const results =
  op.fromView('test', 'claims')
    .groupBy(['member_id', 'company'], [
        'member_name',
        op.count('company_claims', 'claim_no'),
        op.sum('company_amount', 'claim_amount')
        ])
    .select(['member_id',
        'member_name',
        'company_claims',
        'company_amount',
        op.as('company_desc', op.jsonObject([
                op.prop('company',    op.col('company')),
                op.prop('num_claims', op.col('company_claims'))
                ]))
        ])
    .groupBy(['member_id'], [
        'member_name',
        op.sum('num_claims',   'company_claims'),
        op.sum('total_amount', 'company_amount'),
        op.arrayAggregate('companies', 'company_desc')
        ])
    .orderBy(op.desc('total_amount'))
    .limit(200)
    .result()
    .toArray();

By the way, if you specify a column in the aggregates list, it is sampled. 顺便说一句，如果您在聚合列表中指定一列，则会对其进行采样。 Where the column has the same value for the entire group (which I presume is the case with "member_name"), you can sample it instead of specifying it as an additional grouping key. 如果该列在整个组中具有相同的值（我以为“ member_name”就是这种情况），则可以对其进行采样，而不必将其指定为其他分组键。

Also, in modern JavaScript var is usually avoided in favor of const or let. 同样，在现代JavaScript中，通常避免使用var或const来代替var。

Hoping that helps, 希望能有所帮助，

Marklogic光学API

问题描述

1 个解决方案

解决方案1
5 已采纳 2017-11-16 19:16:15

Marklogic光学API

问题描述

1 个解决方案

解决方案1 5 已采纳 2017-11-16 19:16:15

解决方案1
5 已采纳 2017-11-16 19:16:15