简体   繁体   English

TypeError: StructType 不能接受 object '1/1/2021 1:00:00 AM' 类型

[英]TypeError: StructType can not accept object '1/1/2021 1:00:00 AM' in type

I want to create a simple dataframe in PySpark. This datframe should contain a timestamp string "1/1/2021 1:00:00 AM" that later I want to convert from string into timestamp.我想在 PySpark 中创建一个简单的 dataframe。此数据帧应包含一个时间戳字符串“1/1/2021 1:00:00 AM”,稍后我想将其从字符串转换为时间戳。

This is my current code.这是我当前的代码。 When I run it, I get the error "TypeError: StructType can not accept object '1/1/2021 1:00:00 AM' in type".当我运行它时,出现错误“TypeError: StructType cannot accept object '1/1/2021 1:00:00 AM' in type”。 How can I fix it in such a way that finally I can successfully execute to_timestamp ?我怎样才能以最终可以成功执行to_timestamp的方式修复它?

from pyspark.sql.functions import to_timestamp
from pyspark.sql.types import StringType, StructType, StructField

schema = StructType([
    StructField("timestamp_str", StringType(), True)
])

data = [("1/1/2021 1:00:00 AM")]
df = spark.createDataFrame(data, schema=schema)

df = df.withColumn("timestamp", to_timestamp("timestamp_str", "MM/dd/yyyy hh:mm:ss a"))

Update:更新:

After changing data = [("1/1/2021 1:00:00 AM")] to data = [("1/1/2021 1:00:00 AM",)] I get another error.data = [("1/1/2021 1:00:00 AM")]更改为data = [("1/1/2021 1:00:00 AM",)]我收到另一个错误。 It appears when I run df.show() :当我运行df.show()时出现:

org.apache.spark.SparkException: Job aborted due to stage failure: Task 2 in stage 2.0 failed 4 times, most recent failure: Lost task 2.3 in stage 2.0 (TID 10) (10.233.49.69 executor 0): org.apache.spark.SparkUpgradeException: [INCONSISTENT_BEHAVIOR_CROSS_VERSION.PARSE_DATETIME_BY_NEW_PARSER] You may get a different result due to the upgrading to Spark >= 3.0: org.apache.spark.SparkException:作业因阶段失败而中止:阶段 2.0 中的任务 2 失败 4 次,最近的失败:阶段 2.0 中的任务 2.3 丢失 (TID 10) (10.233.49.69 执行程序 0):org.apache。 spark.SparkUpgradeException: [INCONSISTENT_BEHAVIOR_CROSS_VERSION.PARSE_DATETIME_BY_NEW_PARSER] 由于升级到 Spark >= 3.0,您可能会得到不同的结果:

Introduce a new column id and drop it after you create df.引入一个新的列 id 并在创建 df 后将其删除。 Spark throws an error when you create a one column df.当您创建单列 df 时,Spark 会抛出错误。

from pyspark.sql.functions import to_timestamp
from pyspark.sql.types import StringType, StructType, StructField
spark.sql("set spark.sql.legacy.timeParserPolicy=LEGACY")

schema = StructType([  StructField("id", StringType(), True),StructField("timestamp_str", StringType(), True)])

data = [('1',"1/1/2021 1:00:00 AM")]
df = spark.createDataFrame(data, schema=schema).drop('id')

df= df.withColumn("timestamp", to_timestamp("timestamp_str", "MM/dd/yyyy hh:mm:ss a"))

df.show()

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 PySpark:TypeError:StructType不能接受类型的对象 <type 'unicode'> 要么 <type 'str'> - PySpark: TypeError: StructType can not accept object in type <type 'unicode'> or <type 'str'> Spark Sql: TypeError(“StructType 不能接受 %s 类型的对象”% type(obj)) - Spark Sql: TypeError(“StructType can not accept object in type %s” % type(obj)) 获取TypeError(“StructType不能接受类型%s中的对象%r”%(obj,type(obj))) - Getting TypeError(“StructType can not accept object %r in type %s” % (obj, type(obj))) PySpark:TypeError:StructType不能接受类型为0.10000000000000001的对象<type 'numpy.float64'> - PySpark: TypeError: StructType can not accept object 0.10000000000000001 in type <type 'numpy.float64'> pyspark createDataframe typeerror: structtype can not accept object 'id' in type<class 'str'></class> - pyspark createDataframe typeerror: structtype can not accept object 'id' in type <class 'str'> PySpark错误:StructType不能接受类型为0的对象<type 'int'> - PySpark Error: StructType can not accept object 0 in type <type 'int'> 是否有一个 Pandas function 可以将每天的每小时数据如 2021-01-01 01:00:00 到 2021-01-02 00:00:00 分组为一组,依此类推 - Is there a Pandas function that can group the hourly data of each day like 2021-01-01 01:00:00 to 2021-01-02 00:00:00 as one group and so on StructType不能接受pyspark中的对象浮点数 - StructType can not accept object float in pyspark Pandas 在时间为 00:00 时读取 excel 返回类型 object - Pandas read excel returning type object when time is 00:00 如何在格式为“0000-00-00T00:00:00+00:00”的字符串上使用 datetime.strptime? - How can I use datetime.strptime on a string in the format '0000-00-00T00:00:00+00:00'?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM