简体   繁体   中英

How to aggregate data from another cube in cubejs?

I have the following cubes (I'm only showing the data necessary to reproduce the problem):

SentMessages:

cube(`SentMessages`, {
    sql: `Select * from messages_sent`,
    dimensions: {
        campaignId: {
            sql: `campaign_id`,
            type: `number` 
        },
        phone: {
            sql: `phone_number`,
            type: `number`
        }

    }
});

Campaigns:

cube(`Campaign`, {
 sql: `SELECT * FROM campaign`,
  joins: {
    SentMessages: {
        sql: `${Campaign}.id = ${SentMessages}.campaign_id`,
        relationship: `hasMany`
    }
  },
  measures: {
    messageSentCount: {
        sql: `${SentMessages}.phone`,
        type: `count`
    }
  },
  dimensions: {
    name: {
      sql: `name`,
      type: `string`
    },
  }
});

The query being sent looks like this:

  "query": {
    "dimensions": ["Campaign.name"],
    "timeDimensions": [
      {
        "dimension": "Campaign.createdOn",
        "granularity": "day"
      }
    ],
    "measures": [
      "Campaign.messageSentCount"
    ],
    "filters": []
  },
  "authInfo": {
    "iat": 1578961890,
    "exp": 1579048290
  },
  "requestId": "da7bf907-90de-4ba0-80f8-1a802dd442f6"

For some reason this is resulting in the following error:

Error: 'Campaign.messageSentCount' references cubes that lead to row multiplication. Please rewrite it using sub query.

I've searched quite a bit on this error and cant find anything. Can someone please help or provide some insight into the problem? It would be really nice if the framework could show the erroneous sql generated just for troubleshooting purposes.

Campaign has many SentMessages and if joined to calculate Campaign.messageSentCount this calculation results might be affected. There's a simple check that ensures there're no hasMany cubes referenced inside aggregation function. This simple sanity check is required to avoid situation which leads to incorrect calculation results. For example if ReceivedMessages is also added as a join to the Campaign then Campaign.messageSentCount will generate incorrect results if ReceivedMessages and SentMessages are selected simultaneously.

To avoid this sanity check error, substitution with sub query is expected here as follows:

SentMessages:

cube(`SentMessages`, {
  sql: `Select * from messages_sent`,

  measures: {
    count: {
      type: `count`
    }
  },

  dimensions: {
    campaignId: {
      sql: `campaign_id`,
      type: `number` 
    },
    phone: {
      sql: `phone_number`,
      type: `number`
    }
  }
});

Campaigns:

cube(`Campaign`, {
 sql: `SELECT * FROM campaign`,
  joins: {
    SentMessages: {
      sql: `${Campaign}.id = ${SentMessages}.campaign_id`,
      relationship: `hasMany`
    }
  },
  measures: {
    totalMessageSendCount: {
      sql: `${messageSentCount}`,
      type: `sum`
    }
  },
  dimensions: {
    messageSentCount: {
      sql: `${SentMessages.count}`,
      type: `number`,
      subQuery: true
    },
    name: {
      sql: `name`,
      type: `string`
    },
  }
});

For cases where Campaign.messageSentCount doesn't make any sense as a dimension, schema can be simplified and SentMessages.count can be used directly.

I figured part of this out on my own (at least the solution part), figured I'd post in case anyone else was having difficulty:

It appears that this definition is problematic (and uncessary):

    messageSentCount: {
        sql: `${SentMessages}.phone`,
        type: `count`
    }

I believe the correct way to do this is to add a measure to the table you want the COUNT to be applied to. In this query I want a count of SentMessages.phone (as shown above), so the following should be added to the SentMessages cube.

    count: {
       sql: `phone`
       type: `count`,
    },

Then the query works simply as follows:

  "query": {
    "dimensions": [
      "Campaign.name"
    ],
    "timeDimensions": [
      {
        "dimension": "SentMessages.createdOn",
        "granularity": "day"
      }
    ],
    "measures": [
      "SentMessages.count"
    ],
    "filters": []
  },
  "authInfo": {
    "iat": 1578964732,
    "exp": 1579051132
  },
  "requestId": "c84b4596-2ee8-48e7-8e0a-974eb284dde3"

And it works as expected. I still don't understand the row multiplication error and why this measure doesn't work if placed on the Campaign cube. I will wait to accept this answer as i found this experimentally and still unclear of the problem .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM