简体   繁体   中英

converting Timestamp into proper format with DASK in python

The following code is converting any kind of timestamp of dataframe into a given Format.

pd.to_datetime(df_pd["timestamp"]).dt.strftime('%Y-%m-%d %X')

How can I do this with "DASK"? I used the below code but it did not work.

(df is dask dataframe)

a=dd.to_datetime(df["time:timestamp"],format='%Y-%m-%d %X')
a.compute()

Error-: ValueError: unconverted data remains: .304000+00:00

this is how timestamp look like-: "2016-01-01 09:51:15.304000+00:00" (This could be any kind of format)

Expected output -: "2016-01-01 09:51:15"

I found Converting a Dask column into new Dask column of type datetime , but it is not working

Example with Pandas which works with any format-:


import pandas as pd
  

data = ['2016-01-01 09:51:15.304000+00:00','2016-01-01 09:51:15.304000+00:00','2016-01-01 09:51:15.304000+00:00','2016-01-01 09:51:15.304000+00:00']
data1 = ['2016-01-01 09:51:15','2016-01-01 09:51:15','2016-01-01 09:51:15','2016-01-01 09:51:15','2016-01-01 09:51:15']
data2 = ['2016-01-01','2016-01-01','2016-01-01','2016-01-01','2016-01-01']
  

df1 = pd.DataFrame(data2, columns=['t'])

df1['t']=pd.to_datetime(df1["t"]).dt.strftime('%Y-%m-%d %X')

Can someone tell me, how to do the same with "Dask"

Here is my solution

it could be done with following code-: dd.to_datetime(df["t"].compute()).dt.strftime('%Y-%m-%d %X')

but now the problem is that i can't store this conversion in the existing dataframe like i did with pandas.

if i do df["t"]=dd.to_datetime(df["t"].compute()).dt.strftime('%Y-%m-%d %X') , it throws an error.

ValueError: Not all divisions are known, can't align partitions. Please use `set_index` to set the index.

this ValueError: Not all divisions are known, can't align partitions error on dask dataframe does not workk

As you already have the string in the almost correct format, maybe just with with the strings:

df_pd['timestamp'] = df_pd['timestamp'].str.replace(r'\..*', '', regex=True)

Alternatively, if you need to use to_datetime :

pd.to_datetime(df_pd["timestamp"]).dt.strftime('%Y-%m-%d %X')

Or:

pd.to_datetime(df_pd["timestamp"],format='%Y-%m-%d %H:%M:%S.%f%z').dt.strftime('%Y-%m-%d %X')

You can truncate the datetime:

# Solution 1
>>> dd.to_datetime(df['time:timestamp'].str[:19]).compute()
0   2016-01-01 09:51:15
dtype: datetime64[ns]


# Solution 2
>>> dd.to_datetime(df['time:timestamp'].str.split('.').str[0]).compute()
0   2016-01-01 09:51:15
dtype: datetime64[ns]


# Solution 3 (@mozway)
>>> dd.to_datetime(df['time:timestamp'].str.replace('\..*', '', regex=True)).compute()
0   2016-01-01 09:51:15
dtype: datetime64[ns]

Here is how i did it

df["time:timestamp"]=dd.to_datetime(df["time:timestamp"]).dt.strftime('%Y-%m-%d %X')

df.compute()```

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM