简体   繁体   English

Python Spark - 如何创建一个新列,在数据帧上对现有列进行切片?

[英]Python Spark - How to create a new column slicing an existing column on the dataframe?

I need to create a new column on my dataframe through slicing a current column on the same dataframe.我需要通过在同一数据帧上切片当前列来在我的数据帧上创建一个新列。

start_time:timestamp开始时间:时间戳

START_TIME
2017-03-25T13:14:32.000+0000
2018-03-25T13:14:32.000+0000
2019-03-25T13:14:32.000+0000
2020-03-25T13:14:32.000+0000
2021-03-25T13:14:32.000+0000

My output should be something like this我的输出应该是这样的

START_TIME                        NEW_START_TIME
2017-03-25T13:14:32.000+0000      2017-03-25
2018-03-25T13:14:32.000+0000      2018-03-25
2019-03-25T13:14:32.000+0000      2019-03-25
2020-03-25T13:14:32.000+0000      2020-03-25
2021-03-25T13:14:32.000+0000      2021-03-25

I have tried several things but none of them have worked.我尝试了几件事,但没有一个奏效。

tpv =  dataset.start_time_example

tpv['new_start_time'] = tpv['start_time'].slice(0,10)

TypeError: 'Column' object is not callable类型错误:“列”对象不可调用

tpv['newstartdate'] = tpv['start_time'].slice.str[:10]

TypeError: startPos and length must be the same type. TypeError: startPos 和 length 必须是相同的类型。 Got class 'NoneType' and class 'int', respectively.分别获得了“NoneType”类和“int”类。

newstartdate = tpv['start_time'].slice(0,10)
tpv['newstartdate'] = newstartdate

TypeError: 'Column' object is not callable类型错误:“列”对象不可调用

Could you please help me on this?你能帮我解决这个问题吗? (I'm using python 3) (我正在使用 python 3)

Try this it should work.试试这个它应该工作。

from pyspark.sql import functions as f
df.withColumn("new_start_time",f.to_date(f.to_timestamp(df.start_time))).show()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM