[英]TypeError: TimestampType can not accept object <class 'str'> and <class 'int'>
I have a pandas dataframe that I am writing to a table in HDFS. 我有一个熊猫数据框,正在写入HDFS中的表。 I can write the data to a table when the Srum_Entry_Creation
is StringType()
, but I need it to be TimestampType()
. 当Srum_Entry_Creation
为StringType()
,我可以将数据写入表,但我需要将其为TimestampType()
。 This is where I am running into TypeError: TimestampType can not accept object '2019-05-20 12:03:00' in type <class 'str'>
or TypeError: TimestampType can not accept object 1558353780000000000 in type <class 'int'>
. 这是我TypeError: TimestampType can not accept object '2019-05-20 12:03:00' in type <class 'str'>
或TypeError: TimestampType can not accept object 1558353780000000000 in type <class 'int'>
TypeError: TimestampType can not accept object '2019-05-20 12:03:00' in type <class 'str'>
TypeError: TimestampType can not accept object 1558353780000000000 in type <class 'int'>
。 I have tried converting the column to different date formats in python, before defining the schema but can seem to get the import to work. 在定义架构之前,我曾尝试将列转换为python中的不同日期格式,但似乎可以使导入工作。
df
Srum_Entry_ID Connected_Time Machine Srum_Entry_Creation
0 5769.0 0.018218 Computer1 2019-05-20 12:03:00
1 5770.0 0.000359 Computer1 2019-05-20 12:03:00
2 5771.0 0.042674 Computer2 2019-05-20 13:03:00
3 5772.0 0.043229 Computer2 2019-05-20 14:04:00
4 5773.0 0.032222 Computer3 2019-05-20 14:04:00
spark = SparkSession.builder.appName('application').getOrCreate()
schema = StructType([StructField('Srum_Entry_ID', FloatType(), False),
StructField('Connected_Time', FloatType(), True),
StructField('Machine', StringType(), True),
StructField('Srum_Entry_Creation', TimestampType(), True)])
dataframe = spark.createDataFrame(df, schema)
dataframe.write. \
mode("append"). \
option("path", "/user/hive/warehouse/analytics.db/srum_network_connections"). \
saveAsTable("analytics.srum_network_connections")
I have tried: 我努力了:
df['Srum_Entry_Creation'] = df['Srum_Entry_Creation'].astype('datetime64[ns]')
error: TypeError: TimestampType can not accept object 1558353780000000000 in type <class 'int'>
错误: TypeError: TimestampType can not accept object 1558353780000000000 in type <class 'int'>
and 和
df['Srum_Entry_Creation'] = pd.to_datetime(df['Srum_Entry_Creation'])
error: TypeError: TimestampType can not accept object 1558353780000000000 in type <class 'int'>
错误: TypeError: TimestampType can not accept object 1558353780000000000 in type <class 'int'>
and if I just leave it as a string in the pandas dataframe I get: 如果我只是将其作为字符串保留在pandas数据框中,则会得到:
error: TypeError: TimestampType can not accept object '2019-05-20 12:03:00' in type <class 'str'>
错误: TypeError: TimestampType can not accept object '2019-05-20 12:03:00' in type <class 'str'>
In short I converted the datetime to epoch time 简而言之,我将日期时间转换为纪元时间
df['epoch'] = (df['New_Srum_Entry_Creation'] - dt.datetime(1970,1,1)).dt.total_seconds()
df['epoch'] = df['epoch'].astype('Int64')
Then used IntegerType() for the schema 然后将IntegerType()用于架构
StructField('epoch', IntegerType(),True)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.