I need to create a new column on my dataframe through slicing a current column on the same dataframe.
start_time:timestamp
START_TIME
2017-03-25T13:14:32.000+0000
2018-03-25T13:14:32.000+0000
2019-03-25T13:14:32.000+0000
2020-03-25T13:14:32.000+0000
2021-03-25T13:14:32.000+0000
My output should be something like this
START_TIME NEW_START_TIME
2017-03-25T13:14:32.000+0000 2017-03-25
2018-03-25T13:14:32.000+0000 2018-03-25
2019-03-25T13:14:32.000+0000 2019-03-25
2020-03-25T13:14:32.000+0000 2020-03-25
2021-03-25T13:14:32.000+0000 2021-03-25
I have tried several things but none of them have worked.
tpv = dataset.start_time_example
tpv['new_start_time'] = tpv['start_time'].slice(0,10)
TypeError: 'Column' object is not callable
tpv['newstartdate'] = tpv['start_time'].slice.str[:10]
TypeError: startPos and length must be the same type. Got class 'NoneType' and class 'int', respectively.
newstartdate = tpv['start_time'].slice(0,10)
tpv['newstartdate'] = newstartdate
TypeError: 'Column' object is not callable
Could you please help me on this? (I'm using python 3)
Try this it should work.
from pyspark.sql import functions as f
df.withColumn("new_start_time",f.to_date(f.to_timestamp(df.start_time))).show()
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.