[英]update with join on CTE doesn't use index, but with temp table it does
[英]Update table using with temp CTE
%sql
with temp1 as
(
select req_id from table1 order by timestamp desc limit 8000000
)
update table1 set label = '1' where req_id in temp1 and req_query like '%\<\/script\>%'
update table1 set label = '1' where req_id in temp1 and req_query like '%aaaaa%'
update table1 set label = '1' where req_id in temp1 and req_query like '%bbbb%'
收到错误:
com.databricks.backend.common.rpc.DatabricksExceptions$SQLExecutionException: org.apache.spark.sql.catalyst.parser.ParseException: mismatched input 'in' expecting {, ';'}(line 6, pos 93)
有人可以建议吗? 问数据库同样的问题会更便宜吗?
select req_id from table1 order by timestamp desc limit 8000000
这是不允许的: where req_id in temp1
。 temp1
不是一个列表,而是一个查询的结果,应该像另一个表一样使用。
你可能宁愿写这样的东西:
update table1
set label = '1'
where req_id in (select req_id from table1 order by timestamp desc limit 8000000)
and req_query like '%\<\/script\>%'
您可以将 update 与 join 一起使用,而不是 IN 子句。 Delta 不支持使用内部联接更新表,但我认为您可以使用MERGE 。 像这样的东西:
WITH temp1 AS (
SELECT req_id FROM table1 ORDER BY timestamp DESC LIMIT 8000000
)
MERGE INTO table1 a
USING temp1 b
ON (a.req_id = b.req_id)
WHEN MATCHED THEN
UPDATE SET a.label = CASE WHEN a.req_query LIKE '%\<\/script\>%' THEN '1'
WHEN a.req_query LIKE '%aaaaa%' THEN '2'
WHEN a.req_query LIKE '%bbbb%' THEN '3'
ELSE a.label
END
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.