简体   繁体   English

Marklogic光学API

[英]Marklogic Optic API

I've been testing migrating one of our systems to Marklogic 9 and using the Optics API. 我一直在测试将我们的系统之一迁移到Marklogic 9并使用Optics API。

One of our functions involves grouping claims by member_id, member_name and getting the sums and counts, so I did something like this: 我们的功能之一涉及按member_id,member_name分组声明并获取总和和计数,因此我做了如下操作:

var results = op.fromView('test', 'claims')
  .groupBy(['member_id', 'member_name'], [
         op.count('num_claims', 'claim_no'),
         op.sum('total_amount', 'claim_amount')
         ])
  .orderBy(op.desc('total_amount'))
  .limit(200)
  .result()
  .toArray();

Above works fine. 以上工作正常。 The results are of the form 结果的形式为

[
  { 
    member_id: 1, 
    member_name: 'Bob', 
    num_claims: 10, 
    total_amount: 500
  }, 
  ...
]

However, we also have a field "company", where each claim is filed under a different company. 但是,我们还有一个“公司”字段,其中每个索赔都在不同的公司下提出。 Basically the relevant view columns are claim_no, member_id, member_name, company, claim_amount 基本上,相关的视图列是Claim_no,member_id,member_name,company,claim_amount

I would like to be able to show a column that list the different companies for which the member_id/member_name has filed claims, and how many claims for each company. 我希望能够显示一列,其中列出了member_id / member_name已为其提出索赔的不同公司,以及每个公司有多少索赔。

ie I want my results to be something like: 即我希望我的结果是这样的:

[
  { 
    member_id: 1, 
    member_name: 'Bob', 
    num_claims: 10, 
    total_amount: 500,
    companies: [
      {
        company: 'Ajax Co',
        num_claims: 8
      },
      {
        company: 'Side Gig',
        num_claims: 2
      }
    ]
  }, 
  ...
]

I tried something like this: 我尝试过这样的事情:

results = results.map((member, index, array) => {
  var companies = op.fromView('test', 'claims')
    .where(op.eq(op.col('member_id'), member.member_id))
    .groupBy('company', [
      op.count('num_claims', 'claim_no')      
    ])
    .result()
    .toArray();
  member.companies = companies;
  return member;
});

And the output seems correct, but it also executes quite slowly - almost a minute (total number of claim documents is around 120k) 输出似乎正确,但执行速度也很慢-将近一分钟(索赔文档总数约为120k)

In our previous ML8 implementation, we were pre-generating summary documents for each member - so retrieval was reasonably fast with the downside that whenever we got a bunch of new data, all of the summary documents had to be re-generated. 在我们以前的ML8实现中,我们正在为每个成员预先生成摘要文档-因此检索速度相当快,而且缺点是每当我们获得大量新数据时,都必须重新生成所有摘要文档。 I was hoping that ML9's optic API would make it easier to do the retrieval/grouping/aggregates on the fly so we wouldn't have to do that. 我希望ML9的光学API可以更轻松地即时进行检索/分组/聚合,因此我们不必这样做。

In theory, I could just add company to the groupBy fields, then merge the rows in the result query as needed. 从理论上讲,我可以将company添加到groupBy字段中,然后根据需要合并结果查询中的行。 But the problem with that approach is that I can't guarantee I'll get the top 200 by total amount (as was my original query) 但是这种方法的问题是我不能保证我会获得总金额前200名(就像我原来的查询一样)

So, the question is: Is there a better way of doing this with a reasonable execution time? 因此,问题是:在合理的执行时间上是否有更好的方法呢? Or should I just stick to pre-generating the summary documents? 还是我应该坚持只生成摘要文件?

If I'm understanding correctly, you should be able to implement that with a single Optic query that groups twice. 如果我理解正确,那么您应该能够通过将两次光学查询分组的方式来实现这一点。

  • The first group should aggregate to the company level 第一组应汇总到公司级别
  • The second group should aggregate to the member level, collecting the detail with the array aggregate 第二组应汇总到成员级别,并使用数组汇总收集详细信息

The query would probably look something like the following: 该查询可能类似于以下内容:

const results =
  op.fromView('test', 'claims')
    .groupBy(['member_id', 'company'], [
        'member_name',
        op.count('company_claims', 'claim_no'),
        op.sum('company_amount', 'claim_amount')
        ])
    .select(['member_id',
        'member_name',
        'company_claims',
        'company_amount',
        op.as('company_desc', op.jsonObject([
                op.prop('company',    op.col('company')),
                op.prop('num_claims', op.col('company_claims'))
                ]))
        ])
    .groupBy(['member_id'], [
        'member_name',
        op.sum('num_claims',   'company_claims'),
        op.sum('total_amount', 'company_amount'),
        op.arrayAggregate('companies', 'company_desc')
        ])
    .orderBy(op.desc('total_amount'))
    .limit(200)
    .result()
    .toArray();

By the way, if you specify a column in the aggregates list, it is sampled. 顺便说一句,如果您在聚合列表中指定一列,则会对其进行采样。 Where the column has the same value for the entire group (which I presume is the case with "member_name"), you can sample it instead of specifying it as an additional grouping key. 如果该列在整个组中具有相同的值(我以为“ member_name”就是这种情况),则可以对其进行采样,而不必将其指定为其他分组键。

Also, in modern JavaScript var is usually avoided in favor of const or let. 同样,在现代JavaScript中,通常避免使用var或const来代替var。

Hoping that helps, 希望能有所帮助,

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM