[英]Create table as select percentage subquery in Impala DB
I'm newbie of Impala, i need to create table with select resultset, also, this sql is run in Java using JDBC, see my below query:
create table if not exists my_temp_table as select
41 as rule_id,49 as record_id,
(select count(1) as val from dirty_table where msg regexp '^[1]([3-9])[0-9]{9}$' )/(select count(1) from dirty_table);
我需要創建表my_temp_table
並將數據插入到該表中,這是我需要運行的一個 SQL。 但它運行失敗並給出如下錯誤:
[HY000][500051] [Cloudera][ImpalaJDBCDriver](500051) ERROR processing query/statement. Error Code: 0, SQL state: TStatus(statusCode:ERROR_STATUS, sqlState:HY000, errorMessage:ParseException: Syntax error
After checking, i know Impala doesn't support SELECT
clause subquery, we can only use subquery in FROM
or WHERE
clause, see Impala docs: https://impala.apache.org/docs/build/html/topics/impala_subqueries.html .
所以對於這個問題,我該怎么做才能解決這個問題。
我的想法:
WITH
,它可以工作但不能在CREATE TABLE... AS...
中使用。 WITH q1 AS (
select count(1) as val from dirty_table where msg regexp '^[1]([3-9])[0-9]{9}$'
),
q2 AS (
select count(1) val2 from dirty_table
)
SELECT 100 * q1.val / q2.val2 result
FROM q1, q2
BEGIN... END
的語句,然后我可以單獨運行這個 sql。通過您的示例,我將嘗試這些方法,我相信它們可以正常工作。 我用 Impala 檢查了解決方案
CREATE TABLE dirty_table (
id INT,
msg STRING
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
STORED AS TEXTFILE;
[localhost.localdomain:21000] > SELECT * FROM dirty_table;
Query: SELECT * FROM dirty_table
Query submitted at: 2020-07-28 17:05:24 (Coordinator: http://localhost.localdomain:25000)
Query progress can be monitored at: http://localhost.localdomain:25000/query_plan?query_id=5441d6a46ce61e7b:8e49432600000000
+----+-------------+
| id | msg |
+----+-------------+
| 1 | 13321512121 |
| 2 | 13121212121 |
| 3 | 03121212121 |
| 4 | 13321512121 |
| 5 | 13121212121 |
| 6 | 03121212121 |
| 7 | 13121212121 |
+----+-------------+
Fetched 7 row(s) in 0.14s
第一個例子
CREATE TABLE IF NOT EXISTS my_temp_table AS
SELECT 41 AS rule_id, 49 AS record_id, val1 / val2 AS result
FROM (SELECT COUNT(1) AS val1 FROM dirty_table WHERE msg regexp '^[1]([3-9])[0-9]{9}$' ) a,
(SELECT COUNT(1) AS val2 FROM dirty_table) b;
[localhost.localdomain:21000] > CREATE TABLE IF NOT EXISTS my_temp_table AS
> SELECT 41 AS rule_id, 49 AS record_id, val1 / val2 AS result
> FROM (SELECT COUNT(1) AS val1 FROM dirty_table WHERE msg regexp '^[1]([3-9])[0-9]{9}$' ) a,
> (SELECT COUNT(1) AS val2 FROM dirty_table) b;
Query: CREATE TABLE IF NOT EXISTS my_temp_table AS
SELECT 41 AS rule_id, 49 AS record_id, val1 / val2 AS result
FROM (SELECT COUNT(1) AS val1 FROM dirty_table WHERE msg regexp '^[1]([3-9])[0-9]{9}$' ) a,
(SELECT COUNT(1) AS val2 FROM dirty_table) b
+-------------------+
| summary |
+-------------------+
| Inserted 0 row(s) |
+-------------------+
Fetched 1 row(s) in 0.21s
[localhost.localdomain:21000] > invalidate metadata;
[localhost.localdomain:21000] > SELECT * FROM my_temp_table;
Query: select * from my_temp_table
Query submitted at: 2020-07-28 17:03:44 (Coordinator: http://localhost.localdomain:25000)
Query progress can be monitored at: http://localhost.localdomain:25000/query_plan?query_id=47370bf793a09b:29c4dfa000000000
+---------+-----------+--------------------+
| rule_id | record_id | result |
+---------+-----------+--------------------+
| 41 | 49 | 0.7142857142857143 |
+---------+-----------+--------------------+
Fetched 1 row(s) in 0.13s
第二個例子
DROP TABLE my_temp_table;
CREATE TABLE IF NOT EXISTS my_temp_table AS
SELECT result FROM
(WITH q1 AS (
SELECT COUNT(1) AS val FROM dirty_table WHERE msg regexp '^[1]([3-9])[0-9]{9}$'
),
q2 AS (
SELECT COUNT(1) val2 FROM dirty_table
)
SELECT 100 * q1.val / q2.val2 AS result
FROM q1, q2) t;
[localhost.localdomain:21000] > CREATE TABLE IF NOT EXISTS my_temp_table AS
> SELECT result FROM
> (WITH q1 AS (
> SELECT COUNT(1) AS val FROM dirty_table WHERE msg regexp '^[1]([3-9])[0-9]{9}$'
> ),
> q2 AS (
> SELECT COUNT(1) val2 FROM dirty_table
> )
> SELECT 100 * q1.val / q2.val2 AS result
> FROM q1, q2) t;
Query: CREATE TABLE IF NOT EXISTS my_temp_table AS
SELECT result FROM
(WITH q1 AS (
SELECT COUNT(1) AS val FROM dirty_table WHERE msg regexp '^[1]([3-9])[0-9]{9}$'
),
q2 AS (
SELECT COUNT(1) val2 FROM dirty_table
)
SELECT 100 * q1.val / q2.val2 AS result
FROM q1, q2) t
+-------------------+
| summary |
+-------------------+
| Inserted 1 row(s) |
+-------------------+
Fetched 1 row(s) in 0.40s
[localhost.localdomain:21000] > invalidate metadata;
[localhost.localdomain:21000] > SELECT * FROM my_temp_table;
Query: SELECT * FROM my_temp_table
Query submitted at: 2020-07-28 17:08:17 (Coordinator: http://localhost.localdomain:25000)
Query progress can be monitored at: http://localhost.localdomain:25000/query_plan?query_id=3447684ef59d0c4:f70779200000000
+-------------------+
| result |
+-------------------+
| 71.42857142857143 |
+-------------------+
Fetched 1 row(s) in 0.74s
我認為條件平均值可以通過單個表掃描簡單有效地完成您想要的操作:
select avg(case when msg regexp '^[1]([3-9])[0-9]{9}$' then 100.0 else 0 end) result
from dirty_table
您可以將其轉換為create table
語句:
create table my_temp_table as
select avg(case when msg regexp '^[1]([3-9])[0-9]{9}$' then 100.0 else 0 end) result
from dirty_table
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.