[英]Running multiple sql queries in hive/impala for testing pass or fail
I am running 100 queries (test cases) to check for data quality in hive/impala.我正在运行 100 个查询(测试用例)来检查 hive/impala 中的数据质量。 The majority of the queries check for null values based on some conditions.
大多数查询会根据某些条件检查 null 值。 I am using conditional aggregation to count the trivial test cases like below.
我正在使用条件聚合来计算如下所示的琐碎测试用例。 I want to add a more complex query condition to this type of check.
我想为这种类型的检查添加更复杂的查询条件。 I also would like to see the counts if there are nulls.
如果有空值,我也想查看计数。
I want to know how to incorporate the more complex query and also add a count if there are nulls present.我想知道如何合并更复杂的查询,并在存在空值时添加计数。 Expected output below.
预期 output 下面。
What I have so far:到目前为止我所拥有的:
SELECT (CASE WHEN COUNT(*) = COUNT(car_type) THEN 'PASS' ELSE 'FAIL' END) as car_type_test,
(CASE WHEN COUNT(*) = COUNT(car_color) THEN 'PASS' ELSE 'FAIL' END) as car_color_test,
(CASE WHEN COUNT(*) = COUNT(car_sale) THEN 'PASS' ELSE 'FAIL' END) as car_sale_test
FROM car_data;
More complex type query to add:要添加的更复杂的类型查询:
SELECT Count(*),
car_job
FROM car_data
WHERE car_job NOT IN ( "car_type", "car_license", "car_cancellation",
"car_color", "car_contract", "car_metal", "car_number" )
OR car_job IS NULL
GROUP BY car_job
Example expected output:示例预期 output:
car_type_test car_color_test car_sale_test car_job_test
PASS PASS PASS FAIL
102
I would recommend putting this on one row instead of two:我建议把它放在一排而不是两排:
SELECT (CASE WHEN COUNT(*) = COUNT(car_type) THEN 'PASS'
ELSE REPLACE('FAIL ([n])', '[n]', COUNT(*) - COUNT(car_type))
END) as car_type_test,
(CASE WHEN COUNT(*) = COUNT(car_color) THEN 'PASS'
ELSE REPLACE('FAIL ([n])', '[n]', COUNT(*) - COUNT(car_color))
END) as car_color_test,
(CASE WHEN COUNT(*) = COUNT(car_sale) THEN 'PASS'
ELSE REPLACE('FAIL ([n])', '[n]', COUNT(*) - COUNT(car_sale))
END) as car_sale_test
FROM car_data;
If there is an option to have output look more like a table, rather than a 100-column (one per test case?) something, then solution may be easier.如果可以选择让 output 看起来更像一张表,而不是 100 列(每个测试用例一个?)的东西,那么解决方案可能会更容易。
Test name Test results Extra info
car_type_test PASS
car_color_test PASS
car_sale_test PASS
car_job_test PASS 102
For example, you can build a UNION of all your queries, provided they conform to the same schema.例如,您可以构建所有查询的 UNION,前提是它们符合相同的模式。
SELECT * FROM (
SELECT
'car_type_test' `Test name`,
CASE WHEN COUNT(*) = COUNT(car_type) THEN 'PASS'
ELSE 'FAIL'
END `Test result`,
'' `Extra info`
FROM car_data
UNION ALL
...
UNION ALL
SELECT
'car_job_test' `Test name`,
CASE WHEN count(*) > 0 THEN 'FAIL'
ELSE 'PASS'
END `Test result`,
collect_list(cast(count(*) as string) `Extra info`
FROM car_data
WHERE car_job NOT IN ( "car_type", "car_license", "car_cancellation",
"car_color", "car_contract", "car_metal", "car_number" )
OR car_job IS NULL
GROUP BY car_job
) TESTS;
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.