![](/img/trans.png)
[英]Spark: java.lang.RuntimeException: [1.226] failure: identifier expected
[英]Spark java.lang.RuntimeException: Unexpected operator in scalar subquery
我在Cloudera Cluster上的Spark(集群模式)中運行相對較大的查詢時遇到問題。
這是查詢的一部分:
...
CASE WHEN (gender_code = 'M') THEN 1 ELSE 0 END `2114`,
CASE WHEN (gender_code IS NOT NULL AND LENGTH(TRIM(gender_code)) > 0) THEN
1 ELSE 0 END `1780`,
CASE WHEN (( gender_code = 'F'
) AND ( procedure_code between '54000' and '55920' )
) THEN 1 ELSE 0 END `4054`,
CASE WHEN (NVL(gender_code, 'U') = 'U') THEN 1 ELSE 0 END `92501`,
CASE WHEN ((getConstant("FILE_TYPE_CODE") = 'PC' AND gender_code in ('1', 'M')) OR (getConstant("FILE_TYPE_CODE") IN ('ME', 'MC', 'PC') AND gender_code = 'M')) THEN 1 ELSE 0 END `2125`,
CASE WHEN (date_of_birth is NULL) THEN 1 ELSE 0 END `92971`,
/*THIS ONE IS CAUSING ISSUE */( select first(number_of_member_first_name) from( select count (distinct x.member_first_name) as number_of_member_first_name, date_format(x.paid_date,'yyyyMM') as ym from dataset x where cast( datediff(x.date_of_service_from,x.date_of_birth)/365 as INTEGER ) > 60 group by date_format(x.paid_date,'yyyyMM') ) s where s.ym= date_format(a.paid_date,'yyyyMM') ) `93251`,
CASE WHEN (date_of_birth is not null AND LENGTH(TRIM(date_of_birth)) > 0) THEN 1 ELSE 0 END `92504`,
CASE WHEN (member_city IS NOT NULL AND LENGTH(TRIM(member_city)) > 0) THEN 1 ELSE 0 END `1638`,
CASE WHEN (member_city is NULL) THEN 1 ELSE 0 END `92961`,
CASE WHEN (member_state is NULL) THEN 1 ELSE 0 END `92621`,
CASE WHEN (member_state = getConstant("CLIENT_CODE")
) THEN 1 ELSE 0 END `2260`,
CASE WHEN (member_state IS NOT NULL AND LENGTH(TRIM(member_state)) > 0) THEN 1 ELSE 0 END `1961`,
CASE WHEN (member_zip_code IS NOT NULL AND LENGTH(TRIM(member_zip_code)) > 0) THEN 1 ELSE 0 END `1793`,
CASE WHEN (member_zip_code is NULL) THEN 1 ELSE 0 END `92622`,
CASE WHEN (( date_of_service_from > paid_date ) AND ( date_of_service_from is NOT NULL )
...
這個龐大的查詢在其選擇部分具有許多標量子查詢。 當我在本地計算機上測試代碼(單擊鏈接以查看屏幕截圖)時,我在“ / *此問題引起問題* /”中提到的部分運行得很好: 屏幕截圖,但是針對相同的查詢文件在Cloudera群集中運行,它出現以下錯誤:
java.lang.RuntimeException: Unexpected operator in scalar subquery: LocalRelation <empty>, [first(number_of_member_first_name, false)#405275L, ym#404801]
at scala.sys.package$.error(package.scala:27)
at org.apache.spark.sql.catalyst.optimizer.RewriteCorrelatedScalarSubquery$.evalPlan$1(subquery.scala:373)
任何人都可以幫助我弄清楚為什么它在我的本地計算機上可以正常運行,但是在Cloudera Cluster中出現錯誤?
經過仔細調試。 似乎我的數據集的視圖已刪除,因此它沒有任何數據可提供給大多數外部查詢,從而導致對null值具有聚合函數並引發錯誤。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.