简体   繁体   中英

How To Cross Reference Two Big Data Tables With Google BigQuery SQL?

I have two medical data tables that I'm looking to find the hospitals that are common in each table and use two of the fields associated with each hospital in their respective tables. But keep in mind they are in two different tables.

This is what I have so far, but it always comes back with no results.

SELECT
  Spending / Rating AS Ratio,
  Provider_ID_Info AS ID
FROM (
  SELECT
    Period,
    Provider_ID_Spend,
    Avg_Spending_Per_Episode_Hospital as Spending,
    Provider_ID_Info,
    Hospital_overall_rating as Rating
  FROM
    [OmniHealth.HospitalSpending],
    [OmniHealth.HospitalGeneralInfo]
  WHERE
    Provider_ID_Info = Provider_ID_Spend  
  GROUP BY
    Rating,
    Spending,
    Provider_ID_Spend,
    Avg_Spending_Per_Episode_Hospital,
    Provider_ID_Info,
    Hospital_overall_rating,
    Period 
    )

  GROUP BY
    Ratio,
    ID,
    Spending,
    Rating

  ORDER BY
    Ratio

The problem with your query was in fact that in BigQuery legacy SQL - comma is used as a UNION ALL - not as JOIN!!!

So, try below for BigQuery Legacy SQL (as it is a sql dialect of query in your question )

#legacySQL
SELECT
  Spending / Rating AS Ratio,
  Provider_ID_Info AS ID
FROM (
  SELECT
    Period,
    Provider_ID_Spend,
    Avg_Spending_Per_Episode_Hospital AS Spending,
    Provider_ID_Info,
    Hospital_overall_rating AS Rating
  FROM
    [OmniHealth.HospitalSpending] s
  JOIN
    [OmniHealth.HospitalGeneralInfo] i
  ON
    i.Provider_ID_Info = s.Provider_ID_Spend  
  GROUP BY
    Rating,
    Spending,
    Provider_ID_Spend,
    Provider_ID_Info,
    Period 
)
GROUP BY
  Ratio,
  ID,
  Spending,
  Rating
ORDER BY
  Ratio   

It is recommended to work with BigQuery Standard SQL

#standardSQL
SELECT
  Spending / Rating AS Ratio,
  Provider_ID_Info AS ID
FROM (
  SELECT
    Period,
    Provider_ID_Spend,
    Avg_Spending_Per_Episode_Hospital AS Spending,
    Provider_ID_Info,
    Hospital_overall_rating AS Rating
  FROM
    `OmniHealth.HospitalSpending` s
  JOIN
    `OmniHealth.HospitalGeneralInfo` i
  ON
    i.Provider_ID_Info = s.Provider_ID_Spend  
  GROUP BY
    Rating,
    Spending,
    Provider_ID_Spend,
    Provider_ID_Info,
    Period 
)
GROUP BY
  Ratio,
  ID,
  Spending,
  Rating
ORDER BY
  Ratio    

Please note: you might be having issues with ambiguous fields if you have any as result of JOIN - so you will need to use respective aliases - s or i

Update:
also removed extra columns in GROUP BY

Yet another update for run with real data

Ratio   ID
   0.0  10019
   9.0  10019
  39.5  10019
  86.0  10019
 236.5  10019
 458.5  10019
 485.0  10019
 531.0  10019
1259.0  10019
1772.0  10019
8834.0  10019

It is totally another story if this makes sense or not - but it is your query/logic - so up to you

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM