Python Spark - How to create a new column slicing an existing column on the dataframe?

Question

I need to create a new column on my dataframe through slicing a current column on the same dataframe.

start_time:timestamp

START_TIME
2017-03-25T13:14:32.000+0000
2018-03-25T13:14:32.000+0000
2019-03-25T13:14:32.000+0000
2020-03-25T13:14:32.000+0000
2021-03-25T13:14:32.000+0000

My output should be something like this

START_TIME                        NEW_START_TIME
2017-03-25T13:14:32.000+0000      2017-03-25
2018-03-25T13:14:32.000+0000      2018-03-25
2019-03-25T13:14:32.000+0000      2019-03-25
2020-03-25T13:14:32.000+0000      2020-03-25
2021-03-25T13:14:32.000+0000      2021-03-25

I have tried several things but none of them have worked.

tpv =  dataset.start_time_example

tpv['new_start_time'] = tpv['start_time'].slice(0,10)

TypeError: 'Column' object is not callable

tpv['newstartdate'] = tpv['start_time'].slice.str[:10]

TypeError: startPos and length must be the same type. Got class 'NoneType' and class 'int', respectively.

newstartdate = tpv['start_time'].slice(0,10)
tpv['newstartdate'] = newstartdate

TypeError: 'Column' object is not callable

Could you please help me on this? (I'm using python 3)

Answer 1

Try this it should work.

from pyspark.sql import functions as f
df.withColumn("new_start_time",f.to_date(f.to_timestamp(df.start_time))).show()

Python Spark - How to create a new column slicing an existing column on the dataframe?

Question

1 answers

solution1
1 2020-01-06 20:14:49

Python Spark - How to create a new column slicing an existing column on the dataframe?

Question

1 answers

solution1 1 2020-01-06 20:14:49

solution1
1 2020-01-06 20:14:49