簡體   English   中英

在 hive/impala 中運行多個 sql 查詢以測試通過或失敗

[英]Running multiple sql queries in hive/impala for testing pass or fail

我正在運行 100 個查詢(測試用例)來檢查 hive/impala 中的數據質量。 大多數查詢會根據某些條件檢查 null 值。 我正在使用條件聚合來計算如下所示的瑣碎測試用例。 我想為這種類型的檢查添加更復雜的查詢條件。 如果有空值,我也想查看計數。

我想知道如何合並更復雜的查詢,並在存在空值時添加計數。 預期 output 下面。

到目前為止我所擁有的:

SELECT (CASE WHEN COUNT(*) = COUNT(car_type) THEN 'PASS' ELSE 'FAIL' END) as car_type_test,
       (CASE WHEN COUNT(*) = COUNT(car_color) THEN 'PASS' ELSE 'FAIL' END) as car_color_test,
       (CASE WHEN COUNT(*) = COUNT(car_sale) THEN 'PASS' ELSE 'FAIL' END) as car_sale_test       
FROM car_data;

要添加的更復雜的類型查詢:

SELECT Count(*), 
       car_job 
FROM   car_data 
WHERE  car_job NOT IN ( "car_type", "car_license", "car_cancellation", 
                        "car_color", "car_contract", "car_metal", "car_number" ) 
        OR car_job IS NULL 
GROUP  BY car_job

示例預期 output:

car_type_test  car_color_test  car_sale_test  car_job_test
PASS           PASS             PASS           FAIL
                                               102

我建議把它放在一排而不是兩排:

SELECT (CASE WHEN COUNT(*) = COUNT(car_type) THEN 'PASS'
             ELSE REPLACE('FAIL ([n])', '[n]', COUNT(*) - COUNT(car_type))
        END) as car_type_test,
       (CASE WHEN COUNT(*) = COUNT(car_color) THEN 'PASS'
             ELSE REPLACE('FAIL ([n])', '[n]', COUNT(*) - COUNT(car_color))
        END) as car_color_test,
       (CASE WHEN COUNT(*) = COUNT(car_sale) THEN 'PASS'
             ELSE REPLACE('FAIL ([n])', '[n]', COUNT(*) - COUNT(car_sale))
        END) as car_sale_test       
FROM car_data;

如果可以選擇讓 output 看起來更像一張表,而不是 100 列(每個測試用例一個?)的東西,那么解決方案可能會更容易。

Test name         Test results    Extra info
car_type_test     PASS               
car_color_test    PASS
car_sale_test     PASS
car_job_test      PASS            102

例如,您可以構建所有查詢的 UNION,前提是它們符合相同的模式。

SELECT * FROM (
  SELECT 
    'car_type_test' `Test name`,
    CASE WHEN COUNT(*) = COUNT(car_type) THEN 'PASS'
         ELSE 'FAIL' 
    END  `Test result`,
    '' `Extra info`
  FROM car_data
  UNION ALL
  ...
  UNION ALL
  SELECT 
     'car_job_test' `Test name`,
     CASE WHEN count(*) > 0 THEN 'FAIL'
          ELSE 'PASS' 
     END `Test result`,
     collect_list(cast(count(*) as string)  `Extra info`
  FROM   car_data 
  WHERE  car_job NOT IN ( "car_type", "car_license", "car_cancellation", 
                        "car_color", "car_contract", "car_metal", "car_number" ) 
        OR car_job IS NULL 
  GROUP  BY car_job
) TESTS;

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM