I have a SQL query which I have to access in PySpark(DataBricks). due to complex query, PySpark is not able to read the same. can someone check my query and assist me to get this query written in a single 'SELECT' statement not using 'WITH' statement.
Stage:- 1
promotions="""
(WITH VCTE_Promotions as (SELECT v.Shortname, v.Employee_ID_ALT, v.Job_Level,
v.Management_Level, CAST(sysdatetime() AS date) AS PIT_Date, v.Employee_Status_Alt as Employee_Status,
v.Work_Location_Region, v.Work_Location_Country_Desc, v.HML,
[DM_GlobalStaff].[dbo].[V_Worker_PIT].Is_Manager
FROM [DM_GlobalStaff].[dbo].[V_Worker_CUR] as v
LEFT OUTER JOIN
[DM_GlobalStaff].[dbo].[V_Worker_PIT] ON v.Management_Level = [DM_GlobalStaff].[dbo].[V_Worker_PIT].Management_Level),
VCTE_Promotion_v2_Eval as (
SELECT Employee_ID_ALT,
( SELECT max([pit_date]) AS prior_data
FROM [DM_GlobalStaff].[dbo].[V_Worker_PIT] AS t
WHERE (employee_id_alt = a.Employee_ID_ALT) AND (PIT_Date < a.PIT_Date) AND (Is_Manager <> a.Is_Manager) OR
(employee_id_alt = a.Employee_ID_ALT) AND (PIT_Date < a.PIT_Date) AND (Job_Level <> a.Job_Level)) AS prev_job_change_date, Is_Manager
FROM VCTE_Promotions AS a)
SELECT VCTE_Promotion_v2_Eval.Employee_ID_ALT, COALESCE (v_cur.Employee_Status_ALT, N'') AS Curr_Emp_Status,
COALESCE (v_cur.Employee_Type, N'') AS Curr_Employee_Type, v_cur.Hire_Date_Alt AS Curr_Hire_Date,
v_cur.Termination_Date_ALT AS Curr_Termination_Date, COALESCE (v_cur.Termination_Action_ALT, N'')
AS Curr_Termination_Action, cast (v_cur.Job_Level as int) AS Curr_Job_Level,
COALESCE (v_cur.Management_Level, N'') AS Curr_Management_Level,
COALESCE (VCTE_Promotion_v2_Eval.Is_Manager, N'') AS Curr_Ismanager,
CASE WHEN v_m.Job_Level < v_cur.Job_Level OR
(VCTE_Promotion_v2_Eval.Is_Manager = 1 AND v_m.Is_Manager = 0 AND v_m.Job_Level <= v_cur.Job_Level)
THEN 'Promotion' WHEN v_m.Job_Level <> v_cur.Job_Level OR
VCTE_Promotion_v2_Eval.Is_Manager <> v_m.Is_Manager THEN 'Other' ELSE '' END AS Promotion, v_cur.Tenure,
v_cur.Review_Rating_Current
FROM VCTE_Promotion_v2_Eval INNER JOIN
[DM_GlobalStaff].[dbo].[V_Worker_CUR] as v_cur ON VCTE_Promotion_v2_Eval.Employee_ID_ALT = v_cur.Employee_ID_ALT LEFT OUTER JOIN
[DM_GlobalStaff].[dbo].[V_Worker_PIT] as v_m ON VCTE_Promotion_v2_Eval.prev_job_change_date = v_m.PIT_Date AND
VCTE_Promotion_v2_Eval.Employee_ID_ALT = v_m.employee_id_alt
) as pr """
stage-2
promotions = spark.read.jdbc(url=jdbcUrl, table=promotions, properties=connectionProperties)
stage-3
promotions.count()
promotions.show()
Getting below error from Stage-2 query:-
com.microsoft.sqlserver.jdbc.SQLServerException: Incorrect syntax near the keyword 'WITH'.
---------------------------------------------------------------------------
Py4JJavaError Traceback (most recent call last)
<command-2532359884208251> in <module>()
----> 1 promotions = spark.read.jdbc(url=jdbcUrl, table=promotions, properties=connectionProperties)
/databricks/spark/python/pyspark/sql/readwriter.py in jdbc(self, url, table, column, lowerBound, upperBound, numPartitions, predicates, properties)
533 jpredicates = utils.toJArray(gateway, gateway.jvm.java.lang.String, predicates)
534 return self._df(self._jreader.jdbc(url, table, jpredicates, jprop))
--> 535 return self._df(self._jreader.jdbc(url, table, jprop))
536
537
I dont have issue with my query, this is working perfectly fine with my SQL prompt. But as soon as I am using same query in PYSPARK(DataBricks) I am getting syntax error. Will you kindly help me with PySpark syntax as well.
your Prompt assistance will be highly appreciated.
I have no way of testing, but please try it, and compare the result to see if everything is matching.
Also, I am using cross appy instead of correlated subquery because there is no simple join and correlated subquery isn't efficient, So Cross apply should do the job
(
SELECT
VCTE_Promotion_v2_Eval.Employee_ID_ALT
,COALESCE(v_cur.Employee_Type, N'') AS Curr_Employee_Type
,v_cur.Review_Rating_Current
(
SELECT
Employee_ID_ALT,
pr.prev_job_change_date,
IsManager
From
( SELECT
v.Shortname
,v.Employee_ID_ALT
,v.Job_Level
,v.Management_Level
,CAST(SYSDATETIME() AS DATE) AS PIT_Date
,v.Employee_Status_Alt AS Employee_Status
,v.Work_Location_Region
,v.Work_Location_Country_Desc
,v.HML
,dbo.T_Mngmt_Level_IsManager_Mapping.IsManager
FROM Worker_CUR AS v
LEFT OUTER JOIN dbo.T_Mngmt_Level_IsManager_Mapping
ON v.Management_Level = dbo.T_Mngmt_Level_IsManager_Mapping.Management_Level
) as VCTE_Promotions a
Cross APPLY (
SELECT
MAX(PIT_Date) AS prior_data
FROM dbo.V_Worker_PIT_with_IsManager AS t
WHERE (employee_id_alt = a.Employee_ID_ALT)
AND (PIT_Date < a.PIT_Date)
AND (IsManager <> a.IsManager)
OR (employee_id_alt = a.Employee_ID_ALT)
AND (PIT_Date < a.PIT_Date)
AND (Job_Level <> a.Job_Level)
)
AS pr
) as VCTE_Promotion_v2_Eval
INNER JOIN [DM_GlobalStaff].[dbo].[V_Worker_CUR] AS v_cur
ON VCTE_Promotion_v2_Eval.Employee_ID_ALT = v_cur.Employee_ID_ALT
LEFT OUTER JOIN dbo.V_Worker_PIT_with_IsManager AS v_m
ON VCTE_Promotion_v2_Eval.prev_job_change_date = v_m.PIT_Date
AND VCTE_Promotion_v2_Eval.Employee_ID_ALT = v_m.employee_id_alt ) as promotions
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.