简体   繁体   English

使用 python 中的 DASK 将时间戳转换为正确的格式

[英]converting Timestamp into proper format with DASK in python

The following code is converting any kind of timestamp of dataframe into a given Format.以下代码将 dataframe 的任何类型的时间戳转换为给定的格式。

pd.to_datetime(df_pd["timestamp"]).dt.strftime('%Y-%m-%d %X')

How can I do this with "DASK"?我怎样才能用“DASK”做到这一点? I used the below code but it did not work.我使用了下面的代码,但它不起作用。

(df is dask dataframe) (df 是 dask 数据框)

a=dd.to_datetime(df["time:timestamp"],format='%Y-%m-%d %X')
a.compute()

Error-: ValueError: unconverted data remains: .304000+00:00

this is how timestamp look like-: "2016-01-01 09:51:15.304000+00:00" (This could be any kind of format)这就是时间戳的样子—— "2016-01-01 09:51:15.304000+00:00" (可以是任何格式)

Expected output -: "2016-01-01 09:51:15"预期 output -: "2016-01-01 09:51:15"

I found Converting a Dask column into new Dask column of type datetime , but it is not working我发现Converting a Dask column into new Dask column of type datetime ,但它不工作

Example with Pandas which works with any format-: Pandas 的示例适用于任何格式-:


import pandas as pd
  

data = ['2016-01-01 09:51:15.304000+00:00','2016-01-01 09:51:15.304000+00:00','2016-01-01 09:51:15.304000+00:00','2016-01-01 09:51:15.304000+00:00']
data1 = ['2016-01-01 09:51:15','2016-01-01 09:51:15','2016-01-01 09:51:15','2016-01-01 09:51:15','2016-01-01 09:51:15']
data2 = ['2016-01-01','2016-01-01','2016-01-01','2016-01-01','2016-01-01']
  

df1 = pd.DataFrame(data2, columns=['t'])

df1['t']=pd.to_datetime(df1["t"]).dt.strftime('%Y-%m-%d %X')

Can someone tell me, how to do the same with "Dask"谁能告诉我,如何用“Dask”做同样的事情

Here is my solution这是我的解决方案

it could be done with following code-: dd.to_datetime(df["t"].compute()).dt.strftime('%Y-%m-%d %X')可以使用以下代码完成: dd.to_datetime(df["t"].compute()).dt.strftime('%Y-%m-%d %X')

but now the problem is that i can't store this conversion in the existing dataframe like i did with pandas.但现在的问题是我无法像使用 pandas 那样将这个转换存储在现有的 dataframe 中。

if i do df["t"]=dd.to_datetime(df["t"].compute()).dt.strftime('%Y-%m-%d %X') , it throws an error.如果我这样做df["t"]=dd.to_datetime(df["t"].compute()).dt.strftime('%Y-%m-%d %X') ,它会引发错误。

ValueError: Not all divisions are known, can't align partitions. Please use `set_index` to set the index.

this ValueError: Not all divisions are known, can't align partitions error on dask dataframe does not workk这个ValueError: Not all divisions are known, can't align partitions error on dask dataframe does not workk

As you already have the string in the almost correct format, maybe just with with the strings:因为您已经拥有几乎正确格式的字符串,可能只是使用字符串:

df_pd['timestamp'] = df_pd['timestamp'].str.replace(r'\..*', '', regex=True)

Alternatively, if you need to use to_datetime :或者,如果您需要使用to_datetime

pd.to_datetime(df_pd["timestamp"]).dt.strftime('%Y-%m-%d %X')

Or:或者:

pd.to_datetime(df_pd["timestamp"],format='%Y-%m-%d %H:%M:%S.%f%z').dt.strftime('%Y-%m-%d %X')

You can truncate the datetime:您可以截断日期时间:

# Solution 1
>>> dd.to_datetime(df['time:timestamp'].str[:19]).compute()
0   2016-01-01 09:51:15
dtype: datetime64[ns]


# Solution 2
>>> dd.to_datetime(df['time:timestamp'].str.split('.').str[0]).compute()
0   2016-01-01 09:51:15
dtype: datetime64[ns]


# Solution 3 (@mozway)
>>> dd.to_datetime(df['time:timestamp'].str.replace('\..*', '', regex=True)).compute()
0   2016-01-01 09:51:15
dtype: datetime64[ns]

Here is how i did it我是这样做的

df["time:timestamp"]=dd.to_datetime(df["time:timestamp"]).dt.strftime('%Y-%m-%d %X')

df.compute()```

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM