在 hive/impala 中运行多个 sql 查询以测试通过或失败

Question

I am running 100 queries (test cases) to check for data quality in hive/impala.我正在运行 100 个查询（测试用例）来检查 hive/impala 中的数据质量。 The majority of the queries check for null values based on some conditions.大多数查询会根据某些条件检查 null 值。 I am using conditional aggregation to count the trivial test cases like below.我正在使用条件聚合来计算如下所示的琐碎测试用例。 I want to add a more complex query condition to this type of check.我想为这种类型的检查添加更复杂的查询条件。 I also would like to see the counts if there are nulls.如果有空值，我也想查看计数。

I want to know how to incorporate the more complex query and also add a count if there are nulls present.我想知道如何合并更复杂的查询，并在存在空值时添加计数。 Expected output below.预期 output 下面。

What I have so far:到目前为止我所拥有的：

SELECT (CASE WHEN COUNT(*) = COUNT(car_type) THEN 'PASS' ELSE 'FAIL' END) as car_type_test,
       (CASE WHEN COUNT(*) = COUNT(car_color) THEN 'PASS' ELSE 'FAIL' END) as car_color_test,
       (CASE WHEN COUNT(*) = COUNT(car_sale) THEN 'PASS' ELSE 'FAIL' END) as car_sale_test       
FROM car_data;

More complex type query to add:要添加的更复杂的类型查询：

SELECT Count(*), 
       car_job 
FROM   car_data 
WHERE  car_job NOT IN ( "car_type", "car_license", "car_cancellation", 
                        "car_color", "car_contract", "car_metal", "car_number" ) 
        OR car_job IS NULL 
GROUP  BY car_job

Example expected output:示例预期 output：

car_type_test  car_color_test  car_sale_test  car_job_test
PASS           PASS             PASS           FAIL
                                               102

Answer 1

I would recommend putting this on one row instead of two:我建议把它放在一排而不是两排：

SELECT (CASE WHEN COUNT(*) = COUNT(car_type) THEN 'PASS'
             ELSE REPLACE('FAIL ([n])', '[n]', COUNT(*) - COUNT(car_type))
        END) as car_type_test,
       (CASE WHEN COUNT(*) = COUNT(car_color) THEN 'PASS'
             ELSE REPLACE('FAIL ([n])', '[n]', COUNT(*) - COUNT(car_color))
        END) as car_color_test,
       (CASE WHEN COUNT(*) = COUNT(car_sale) THEN 'PASS'
             ELSE REPLACE('FAIL ([n])', '[n]', COUNT(*) - COUNT(car_sale))
        END) as car_sale_test       
FROM car_data;

Answer 2

If there is an option to have output look more like a table, rather than a 100-column (one per test case?) something, then solution may be easier.如果可以选择让 output 看起来更像一张表，而不是 100 列（每个测试用例一个？）的东西，那么解决方案可能会更容易。

Test name         Test results    Extra info
car_type_test     PASS               
car_color_test    PASS
car_sale_test     PASS
car_job_test      PASS            102

For example, you can build a UNION of all your queries, provided they conform to the same schema.例如，您可以构建所有查询的 UNION，前提是它们符合相同的模式。

SELECT * FROM (
  SELECT 
    'car_type_test' `Test name`,
    CASE WHEN COUNT(*) = COUNT(car_type) THEN 'PASS'
         ELSE 'FAIL' 
    END  `Test result`,
    '' `Extra info`
  FROM car_data
  UNION ALL
  ...
  UNION ALL
  SELECT 
     'car_job_test' `Test name`,
     CASE WHEN count(*) > 0 THEN 'FAIL'
          ELSE 'PASS' 
     END `Test result`,
     collect_list(cast(count(*) as string)  `Extra info`
  FROM   car_data 
  WHERE  car_job NOT IN ( "car_type", "car_license", "car_cancellation", 
                        "car_color", "car_contract", "car_metal", "car_number" ) 
        OR car_job IS NULL 
  GROUP  BY car_job
) TESTS;

在 hive/impala 中运行多个 sql 查询以测试通过或失败

问题描述

1 个解决方案

解决方案1
0 2019-10-31 14:52:20

解决方案2
0 2019-11-01 21:49:13

在 hive/impala 中运行多个 sql 查询以测试通过或失败

问题描述

1 个解决方案

解决方案1 0 2019-10-31 14:52:20

解决方案2 0 2019-11-01 21:49:13

解决方案1
0 2019-10-31 14:52:20

解决方案2
0 2019-11-01 21:49:13