简体   繁体   中英

How filter PySpark DataFrame with PySpark for current date

I have a dataframe with the following fields

在此处输入图片说明

I'm trying to use PySpark to filter on SaleDate where the SaleDate is the current date.

My attempt is as follows

from pyspark.sql.functions import col

df.where((col("SaleDate") = to_date())

This is assuming todays date is 16/10/2021

I keep on getting the error:

SyntaxError: keyword can't be an expression (<stdin>, line 2)

I should mention that the SaleDate is actually a StringType() and not DateType as shown in the image.

|-- SaleDate: string (nullable = true)

You should use current_date function to get the current date instead of to_date .

So you first need to convert value in SaleDate column from string to date with to_date , then compare the obtained date with current_date :

from pyspark.sql import functions as F

df.where(F.to_date('SaleDate', 'yyyy/MM/dd HH:mm:ss.SSS') == F.current_date())

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM