简体   繁体   中英

Having trouble joining properly in bigquery

I am trying to grab some information from a first table and then link it to some demographic information I have.

SELECT
  colA,
  colB,
  DATE(serverTimeStamp) AS newDate,
  eventType,
  pgSource,
  COUNT(*) FROM (
  SELECT
    *,
    MAX(IF(LOWER(parameters.name)="pagesource", parameters.value, NULL)) WITHIN RECORD AS pgSource
  FROM
    TABLE_DATE_RANGE(mytableA, TIMESTAMP('2016-02-02 00:00:00'), TIMESTAMP('2016-02-02 23:59:59')) )
WHERE
  LOWER(parameters.name)="Allison"
GROUP BY
  parameters.name,
  parameters.value,
  newDate,
  eventType,
  pgSource

However, adding a new table changes my results (the count should be the same). It's the first table's results that has the right data.

SELECT

  colA,
  colB
  DATE(serverTimeStamp) AS newDate,
  eventType,
  UD.gender,
  UD.locationKey
  pgSource,

  COUNT(distinct instanceId) FROM (

  SELECT
    *,
    MAX(IF(LOWER(parameters.name)="Allison", parameters.value, NULL)) WITHIN RECORD AS pgSource
  FROM
    TABLE_DATE_RANGE(myTableA, TIMESTAMP('2016-02-02 00:00:00'), TIMESTAMP('2016-02-02 23:59:59')) ) EV

join each
  replicated.UserDimension AS UD
ON
  UD.userId = EV.userId


WHERE
  LOWER(parameters.name)="isfirstcontact"

GROUP EACH BY
  colA,
 colB,
  newDate,
  eventType,
  pgSource,
  UD.gender,
  UD.locationKey

Any tips on how I can approach this?

** Mikhail has kindly reminded me that having multiple userIds on the first table will throw my count off. How do I adjust for this fact?

Any tips?

Just to give you an idea - run below and see difference

SELECT 
  COUNT(*), 
  COUNT(1), 
  COUNT(instanceId), 
  COUNT(DISTINCT instanceId) 
FROM
  (SELECT NULL AS instanceId),
  (SELECT 1 AS instanceId),
  (SELECT 2 AS instanceId),
  (SELECT 1 AS instanceId),

And, I recommend you to check difference betweenCOUNT([DISTINCT] ...) and EXACT_COUNT_DISTINCT()

Another direction to look into - check if you have dup userId (multiple rows for the same userId) on any side - this also can be a source of count mismatch

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM