I'm trying to cast a rfc2822 datetime column to a timestamp column. if i'm working with the variable outside a dataframe it's worked. But in a dataframe I receive an error message
My imports:
from pyspark.sql.types import *
from pyspark.sql.column import *
from pyspark.sql.functions import *
from email.utils import parsedate_to_datetime
Working outside the dataframe this is the code:
datestr = "Thu Sep 12 2019 15:58:30 GMT-0500 (hora estándar de Colombia)"
print(parsedate_to_datetime(datestr))
Output:
2019-09-12 15:58:30
But, if i'm working with this dataframe:
df =
spark.createDataFrame(["Thu Sep 12 2019 15:58:30 GMT-0500 (hora estándar de Colombia)"], "string",).toDF("Date")
And try to create another column with the following code:
df2 = df.withColumn("timestamp", parsedate_to_datetime(col("Date")))
I receive the error Message:
"Cannot convert column into bool: please use '&' for 'and', '|' for 'or', " ValueError: Cannot convert column into bool: please use '&' for 'and', '|' for 'or', '~' for 'not' when building DataFrame boolean expressions.
Register parsedate_to_datetime
as a UDF to allow it to interop with Spark's data types:
>>> from pyspark.sql.types import *
>>> from pyspark.sql.column import *
>>> from pyspark.sql.functions import *
>>> from email.utils import parsedate_to_datetime
>>> df = spark.createDataFrame(["Thu Sep 12 2019 15:58:30 GMT-0500 (hora estándar de Colombia)"], "string",).toDF("Date")
>>> parsedate_to_datetime_udf = udf(parsedate_to_datetime, TimestampType())
>>> df2 = df.withColumn("timestamp", parsedate_to_datetime_udf(col("Date")))
>>> df2.show()
+--------------------+-------------------+
| Date| timestamp|
+--------------------+-------------------+
|Thu Sep 12 2019 1...|2019-09-12 15:58:30|
+--------------------+-------------------+
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.