TypeError: StructType can not accept object '1/1/2021 1:00:00 AM' in type

Question

I want to create a simple dataframe in PySpark. This datframe should contain a timestamp string "1/1/2021 1:00:00 AM" that later I want to convert from string into timestamp.

This is my current code. When I run it, I get the error "TypeError: StructType can not accept object '1/1/2021 1:00:00 AM' in type". How can I fix it in such a way that finally I can successfully execute to_timestamp ?

from pyspark.sql.functions import to_timestamp
from pyspark.sql.types import StringType, StructType, StructField

schema = StructType([
    StructField("timestamp_str", StringType(), True)
])

data = [("1/1/2021 1:00:00 AM")]
df = spark.createDataFrame(data, schema=schema)

df = df.withColumn("timestamp", to_timestamp("timestamp_str", "MM/dd/yyyy hh:mm:ss a"))

Update:

After changing data = [("1/1/2021 1:00:00 AM")] to data = [("1/1/2021 1:00:00 AM",)] I get another error. It appears when I run df.show() :

org.apache.spark.SparkException: Job aborted due to stage failure: Task 2 in stage 2.0 failed 4 times, most recent failure: Lost task 2.3 in stage 2.0 (TID 10) (10.233.49.69 executor 0): org.apache.spark.SparkUpgradeException: [INCONSISTENT_BEHAVIOR_CROSS_VERSION.PARSE_DATETIME_BY_NEW_PARSER] You may get a different result due to the upgrading to Spark >= 3.0:

Answer 1

Introduce a new column id and drop it after you create df. Spark throws an error when you create a one column df.

from pyspark.sql.functions import to_timestamp
from pyspark.sql.types import StringType, StructType, StructField
spark.sql("set spark.sql.legacy.timeParserPolicy=LEGACY")

schema = StructType([  StructField("id", StringType(), True),StructField("timestamp_str", StringType(), True)])

data = [('1',"1/1/2021 1:00:00 AM")]
df = spark.createDataFrame(data, schema=schema).drop('id')

df= df.withColumn("timestamp", to_timestamp("timestamp_str", "MM/dd/yyyy hh:mm:ss a"))

df.show()

TypeError: StructType can not accept object '1/1/2021 1:00:00 AM' in type

Question

1 answers

solution1
0 2022-12-14 12:25:43

TypeError: StructType can not accept object '1/1/2021 1:00:00 AM' in type

Question

1 answers

solution1 0 2022-12-14 12:25:43

solution1
0 2022-12-14 12:25:43