简体   繁体   English

跨JOIN集合和GroupBy CosmosDB Javascript API

[英]Cross JOIN collections and GroupBy CosmosDB Javascript API

I am searching for a solution in the Javascript API for CosmosDB, where you can perform an INNER/OUTER JOIN between two document collections. 我正在寻找CosmosDB的Javascript API中的解决方案,您可以在其中在两个文档集合之间执行INNER / OUTER JOIN。

I have been unsuccessful. 我一直没有成功。

From my understanding, Javascript Stored Procedures run within a collection, and cannot access/reference data in another collection. 据我了解,Javascript存储过程在一个集合中运行,并且无法访问/引用另一个集合中的数据。

If the above is true, where does this leave our application's datasource that has been designed in a relational way? 如果上述情况成立,那么以关系方式设计的应用程序数据源将留在哪里? If Business requires a immediate query, to collect pthe following data: All agreements/contracts that has been migrated to a new product offering, within a specific region, for a given time frame. 如果企业需要立即查询,则收集以下数据:在给定的时间范围内,已在特定区域内迁移到新产品的所有协议/合同。 How would I go about this query, if there are about 5 collections containing all infromation related to this query? 如果大约有5个集合包含与此查询相关的所有信息,我将如何处理该查询?

Any guidance? 有指导吗?

UPDATE UPDATE

Customer 顾客

{
    "id": "d02e6668-ce24-455d-b241-32835bb2dcb5",
    "Name": "Test User One",
    "Surname": "Test"
}

Agreement 协议

{
    "id": "ee1094bd-16f4-45ec-9f5e-7ecd91d4e729",
    "CustomerId": "d02e6668-ce24-455d-b241-32835bb2dcb5"
    "RetailProductVersionInstance": 
    [
                {
        "id": "8ce31e7c-7b1a-4221-89a3-449ae4fd6622",
        "RetailProductVersionId": "ce7a44a4-7e49-434b-8a51-840599fbbfbb",
        "AgreementInstanceUser": {
            "FirstName": "Luke",
            "LastName": "Pothier",
            "AgreementUserTypeId": ""
        },
        "AgreementInstanceMSISDN": {
            "IsoCountryDialingCode": null,
            "PhoneNumber": "0839263922",
            "NetworkOperatorId": "30303728-9983-47f9-a494-1de853d66254"
        },
        "RetailProductVersionInstanceState": "IN USE",
        "IsPrimaryRetailProduct": true,
        "RetailProductVersionInstancePhysicalItems": [
            {
                "id": "f8090aba-f06b-4233-9f9e-eb2567a20afe",
                "PhysicalItemId": "75f64ab3-81d2-f600-6acb-d37da216846f",
                "RetailProductVersionInstancePhysicalItemNumbers": [
                    {
                        "id": "9905058b-8369-4a64-b9a5-e17e28750fba",
                        "PhysicalItemNumberTypeId": "39226b5a-429b-4634-bbce-2213974e5bab",
                        "PhysicalItemNumberValue": "KJDS959405"
                    },
                    {
                        "id": "1fe09dd2-fb8a-49b3-99e6-8c51df10adb1",
                        "PhysicalItemNumberTypeId": "960a1750-64be-4333-9a7f-c8da419d670a",
                        "PhysicalItemNumberValue": "DJDJ94943"
                    }
                ],
                "RetailProductVersionInstancePhysicalItemState": "IN USE",
                "DateCreatedUtc": "2018-11-21T13:55:00Z",
                "DateUpdatedUtc": "2020-11-21T13:55:00Z"
            }
        ]
    }
    ]
}

RetailProduct RetailProduct

{
    "id": "ce7a44a4-7e49-434b-8a51-840599fbbfbb",
    "FriendlyName": "Data-Package 100GB",
    "WholeSaleProductId": "d054dae5-173d-478b-bb0e-7516e6a24476"
}

WholeSaleProduct: WholeSaleProduct:

{
    "id": "d054dae5-173d-478b-bb0e-7516e6a24476",
    "ProductName": "Data 100",
    "ProviderLiabilities": []
}

Above, I have added some sample documentation. 上面,我添加了一些示例文档。

Relationships: 关系:

  • Agreement.CustomerId links to Customer.id Agreement.CustomerId链接到Customer.id
  • Agreement.RetailProductVersionInstance.RetailProductVersionId links to RetailProduct.id Agreement.RetailProductVersionInstance.RetailProductVersionId链接到RetailProduct.id
  • RetailProduct.WholeSaleProductId links to WholeSaleProduct.id RetailProduct.WholeSaleProductId链接到WholeSaleProduct.id

How, would I write a Javascript Stored Procedure, in CosmosDB, to perform joins between these 4 collections? 如何在CosmosDB中编写Javascript存储过程来执行这4个集合之间的联接?

Short answer is that you cannot perform joins between different collections via SQL in Cosmos DB. 简短的答案是,您无法在Cosmos DB中通过SQL执行不同集合之间的联接。

Generally, the solution to this type of question is multiple queries or different schema. 通常,此类问题的解决方案是多个查询或不同的模式。 In your scenario, if you can denormalize your schema into one collection without duplicating data, then it is easy. 在您的方案中,如果您可以在不复制数据的情况下将模式规范化为一个集合,那么这很容易。

If you provide your schemas, it'd be possible to provide a more comprehensive answer. 如果您提供模式,则可以提供更全面的答案。

-- Edit 1 -- -编辑1-

Stored Procedures are only good candidates for operations that require multiple operations on the same collection + partition key. 对于需要在同一集合+分区键上进行多个操作的操作,存储过程才是最佳的选择。 This makes them good for bulk insert/delete/update, transactions (which need at least a read and a write), and a few other things. 这使它们对于批量插入/删除/更新,事务(至少需要读和写)以及其他一些事情非常有用。 They aren't good for CPU intensive things, but rather things that would normally be IO bound by network latency. 它们不适用于CPU密集型的事情,而是通常受网络延迟限制的IO。 They aren't possible to use for cross partition or cross collection scenarios. 它们不能用于交叉分区或交叉收集方案。 In those cases, you must perform the operations exclusively from the remote client. 在这种情况下,您必须专门从远程客户端执行操作。

In your case, it's a fairly straightforward 2 + 2N separate reads, where N is the number of products. 在您的情况下,这是一个非常简单的2 + 2N独立读取,其中N是产品数。 You need to read the agreement first. 您需要先阅读协议。 Then you can look up the customer and the product records in parallel, and then you can look up the wholesale record last, so you should have a latency of 3s + C , where s is the average duration of a given read request and C is some constant CPU time to perform the join/issue the request/etc. 然后,您可以并行查找客户和产品记录,然后可以最后查找批发记录,因此延迟应为3s + C ,其中s是给定读取请求的平均持续时间, C是一些恒定的CPU时间来执行加入/发出请求/等等。

It's worth considering whether you can consolidate RetailProduct and WholeSale product into a single record where Wholesale contains all the RetailProducts in an array, or as separate documents, partitioned by the wholesale id, with a well known id that contained the Wholesale product info in a separate document. 值得考虑的是,您是否可以将RetailProduct和WholeSale产品合并为一个记录,其中Wholesale以阵列的形式包含所有RetailProducts,或者作为单独的文档,按批发ID进行分区,而众所周知的ID则将批发产品信息包含在一个单独的记录中文献。 That would reduce your latency by 1 third. 这样可以将延迟减少三分之一。 If you go with the partitioning by wholesale id idea, you could write 1 query for any records that shared a wholesale id, so you'd get 2 + log(N) reads, but the same effective latency. 如果您采用按批发ID进行分区的想法,则可以为共享批发ID的所有记录编写1条查询,这样您将获得2 + log(N)读取,但有效延迟相同。 For that strategy, you'd store a composite index of "wholesaleid+productid" in the agreement. 对于该策略,您将在协议中存储“批发价+产品编号”的综合索引。 One issue to worry about is that it duplicates the wholesale+product relationship, but as long as that relationship doesn't change, I don't think there is anything to worry about and it provides a good optimization for info lookup. 值得担心的一个问题是,它重复了批发与产品的关系,但是只要这种关系不变,我认为就没有什么可担心的,它为信息查找提供了很好的优化。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM