简体   繁体   中英

SQL-query error in pyspark while using temp-table

I have a SQL query which I have to access in PySpark(DataBricks). due to complex query, PySpark is not able to read the same. can someone check my query and assist me to get this query written in a single 'SELECT' statement not using 'WITH' statement.

Stage:- 1
promotions="""
(WITH VCTE_Promotions as (SELECT v.Shortname, v.Employee_ID_ALT, v.Job_Level, 
                         v.Management_Level, CAST(sysdatetime() AS date) AS PIT_Date, v.Employee_Status_Alt as Employee_Status, 
                         v.Work_Location_Region, v.Work_Location_Country_Desc, v.HML, 
                         [DM_GlobalStaff].[dbo].[V_Worker_PIT].Is_Manager
FROM           [DM_GlobalStaff].[dbo].[V_Worker_CUR] as v 
LEFT OUTER JOIN
[DM_GlobalStaff].[dbo].[V_Worker_PIT] ON v.Management_Level = [DM_GlobalStaff].[dbo].[V_Worker_PIT].Management_Level),

VCTE_Promotion_v2_Eval as (
SELECT        Employee_ID_ALT,
                             ( SELECT max([pit_date]) AS prior_data 
                               FROM [DM_GlobalStaff].[dbo].[V_Worker_PIT] AS t
                               WHERE (employee_id_alt = a.Employee_ID_ALT) AND (PIT_Date < a.PIT_Date) AND (Is_Manager <> a.Is_Manager) OR
                                      (employee_id_alt = a.Employee_ID_ALT) AND (PIT_Date < a.PIT_Date) AND (Job_Level <> a.Job_Level)) AS prev_job_change_date, Is_Manager
FROM            VCTE_Promotions AS a)

SELECT  VCTE_Promotion_v2_Eval.Employee_ID_ALT, COALESCE (v_cur.Employee_Status_ALT, N'') AS Curr_Emp_Status, 
                         COALESCE (v_cur.Employee_Type, N'') AS Curr_Employee_Type, v_cur.Hire_Date_Alt AS Curr_Hire_Date, 
                         v_cur.Termination_Date_ALT  AS Curr_Termination_Date, COALESCE (v_cur.Termination_Action_ALT, N'') 
                         AS Curr_Termination_Action, cast (v_cur.Job_Level as int) AS Curr_Job_Level, 
                         COALESCE (v_cur.Management_Level, N'') AS Curr_Management_Level, 
                         COALESCE (VCTE_Promotion_v2_Eval.Is_Manager, N'') AS Curr_Ismanager, 
                         CASE WHEN v_m.Job_Level < v_cur.Job_Level OR
                         (VCTE_Promotion_v2_Eval.Is_Manager = 1 AND v_m.Is_Manager = 0 AND v_m.Job_Level <= v_cur.Job_Level) 
                         THEN 'Promotion' WHEN v_m.Job_Level <> v_cur.Job_Level OR
                         VCTE_Promotion_v2_Eval.Is_Manager <> v_m.Is_Manager THEN 'Other' ELSE '' END AS Promotion, v_cur.Tenure, 
                         v_cur.Review_Rating_Current
FROM            VCTE_Promotion_v2_Eval INNER JOIN
                         [DM_GlobalStaff].[dbo].[V_Worker_CUR] as v_cur ON VCTE_Promotion_v2_Eval.Employee_ID_ALT = v_cur.Employee_ID_ALT LEFT OUTER JOIN
                         [DM_GlobalStaff].[dbo].[V_Worker_PIT] as v_m ON VCTE_Promotion_v2_Eval.prev_job_change_date = v_m.PIT_Date AND 
                         VCTE_Promotion_v2_Eval.Employee_ID_ALT = v_m.employee_id_alt
) as pr """

stage-2
promotions = spark.read.jdbc(url=jdbcUrl, table=promotions, properties=connectionProperties)

stage-3
promotions.count()
promotions.show()

Getting below error from Stage-2 query:-

com.microsoft.sqlserver.jdbc.SQLServerException: Incorrect syntax near the keyword &apos;WITH&apos;.

---------------------------------------------------------------------------
Py4JJavaError                             Traceback (most recent call last)
<command-2532359884208251> in <module>()
----> 1 promotions = spark.read.jdbc(url=jdbcUrl, table=promotions, properties=connectionProperties)

/databricks/spark/python/pyspark/sql/readwriter.py in jdbc(self, url, table, column, lowerBound, upperBound, numPartitions, predicates, properties)
    533             jpredicates = utils.toJArray(gateway, gateway.jvm.java.lang.String, predicates)
    534             return self._df(self._jreader.jdbc(url, table, jpredicates, jprop))
--> 535         return self._df(self._jreader.jdbc(url, table, jprop))
    536 
    537 

I dont have issue with my query, this is working perfectly fine with my SQL prompt. But as soon as I am using same query in PYSPARK(DataBricks) I am getting syntax error. Will you kindly help me with PySpark syntax as well.

your Prompt assistance will be highly appreciated.

I have no way of testing, but please try it, and compare the result to see if everything is matching.

Also, I am using cross appy instead of correlated subquery because there is no simple join and correlated subquery isn't efficient, So Cross apply should do the job

(
    SELECT
        VCTE_Promotion_v2_Eval.Employee_ID_ALT
       ,COALESCE(v_cur.Employee_Type, N'') AS Curr_Employee_Type
       ,v_cur.Review_Rating_Current
    (
    SELECT
    Employee_ID_ALT,
    pr.prev_job_change_date,
    IsManager        
    From 
        ( SELECT
            v.Shortname
           ,v.Employee_ID_ALT
           ,v.Job_Level
           ,v.Management_Level
           ,CAST(SYSDATETIME() AS DATE) AS PIT_Date
           ,v.Employee_Status_Alt AS Employee_Status
           ,v.Work_Location_Region
           ,v.Work_Location_Country_Desc
           ,v.HML
           ,dbo.T_Mngmt_Level_IsManager_Mapping.IsManager
        FROM Worker_CUR AS v
        LEFT OUTER JOIN dbo.T_Mngmt_Level_IsManager_Mapping
        ON v.Management_Level = dbo.T_Mngmt_Level_IsManager_Mapping.Management_Level
        ) as VCTE_Promotions a
    Cross APPLY ( 
                 SELECT
                    MAX(PIT_Date) AS prior_data
                 FROM dbo.V_Worker_PIT_with_IsManager AS t
                 WHERE (employee_id_alt = a.Employee_ID_ALT)
                 AND (PIT_Date < a.PIT_Date)
                 AND (IsManager <> a.IsManager)
                 OR (employee_id_alt = a.Employee_ID_ALT)
                 AND (PIT_Date < a.PIT_Date)
                 AND (Job_Level <> a.Job_Level)
                 )
                AS pr
            ) as VCTE_Promotion_v2_Eval
    INNER JOIN [DM_GlobalStaff].[dbo].[V_Worker_CUR] AS v_cur
            ON VCTE_Promotion_v2_Eval.Employee_ID_ALT = v_cur.Employee_ID_ALT
        LEFT OUTER JOIN dbo.V_Worker_PIT_with_IsManager AS v_m
            ON VCTE_Promotion_v2_Eval.prev_job_change_date = v_m.PIT_Date
                AND VCTE_Promotion_v2_Eval.Employee_ID_ALT = v_m.employee_id_alt ) as promotions

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM