I am spark dataframe with below schema.
-root
|-- ME_KE: string (nullable = true)
|-- CSPD_CAT: string (nullable = true)
|-- EFF_DT: string (nullable = true)
|-- TER_DT: string (nullable = true)
|-- CREATE_DTM: string (nullable = true)
|-- ELIG_IND: string (nullable = true)
Basically I am trying to convert spark SQL code into SQL on directly on dataframe.
df=spark.read.format('csv').load(SourceFilesPath+"\\cutdetl.csv",infraSchema=True,header=True)
df.createOrReplaceTempView("cutdetl")
spark.sql(f"""select
me_ke,
eff_dt,
ter_dt,
create_dtm
from
cutdetl
where
(elig_ind = 'Y') and
((to_date({start_dt},'dd-mon-yyyy') between eff_dt and ter_dt) or
(eff_dt between to_date({start_dt}'dd-mon-yyyy') and to_date({end_dt},'dd-mon-yyyy'))
""")
Below is the code I have tried.
df1=df.select("me_ke","eff_dt","ter_dt","elig_ind")
.where(col("elig_ind")=="Y" & (F.to_date('31-SEP-2022', dd-mon-yyyy')
.between(col("mepe_eff_dt"),col("mepe_term_dt"))) |
(F.to_date(col("eff_dt"))
.between(F.to_date('31-DEC-2022'),F.to_date('31-DEC-2022'))))
I am getting below error:
py4j.Py4JException: Method and([class java.lang.String]) does not exist```
Could anyone help with converting above code to dataframe level SQL
I'd go like this
from pyspark.sql.functions import col
df=spark.read.format('csv').load(SourceFilesPath+"\\cutdetl.csv",infraSchema=True,header=True)
df.createOrReplaceTempView("cutdetl")
df1 = df.filter(col("elig_ind") == "Y")
df1 = df1.filter((col("eff_dt").between(f"to_date({start_dt},'dd-mon-yyyy')", f"to_date({end_dt},'dd-mon-yyyy')")) |
(f"to_date({start_dt},'dd-mon-yyyy')".between(col("eff_dt"), col("ter_dt"))))
df1 = df1.select("me_ke", "eff_dt", "ter_dt", "create_dtm")
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.