在 Dask 中使用 read_sql_table 将时间戳分配为索引

Question

In SQLite, I have a table data with index column time which is a timestamp generated at recording by time.time()在 SQLite 中，我有一个带有索引列time的表data ，它是time.time()记录时生成的时间戳

I want to load data from this table to a Dask DataFrame.我想将此表中的数据加载到 Dask DataFrame。 For that I use:为此，我使用：

import dask.dataframe as dd
data = dd.read_sql_table('data', 'sqlite:///'+DB_PATH, index_col='time', parse_dates={"time": {"unit":"s"}})

When i want to apply a rolling mean to the data data.resample('15S').mean() , I get:当我想对数据data.resample('15S').mean()应用滚动平均值时，我得到：

TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'Float64Index'

So if I check the index by data.index , it gives me this, suggesting it's in the right dtype and that the parse_dates actually worked, right?因此，如果我通过data.index检查索引，它会给我这个，表明它在正确的 dtype 中并且parse_dates确实有效，对吗？ : ：

Dask Index Structure:
npartitions=1
1.619876e+09    datetime64[ns]
1.620067e+09               ...
Name: time, dtype: datetime64[ns]
Dask Name: from-delayed, 3 tasks

Finally, I tried to defined the index datetime after loading:最后，我尝试在加载后定义索引日期时间：

data = dd.read_sql_table('data', 'sqlite:///'+DB_PATH, index_col='time')
data['time__1'] = pd.to_datetime(np.array(data['time__1']), unit='s') # By the way, I don't know why Dask creates a 'time__1' column...
data = data.set_index('time__1', sorted=True)

But then I get this message...但是后来我收到了这个消息......

ValueError: Length of values does not match length of index

Answer 1

Here is a solution I came up with.这是我想出的解决方案。 It does not seem to be the most efficient since it does not take advantage of date_parse directly from read_sql_table() and rely on column time__1 that was generated by Dask for a certain reason I don't know...它似乎不是最有效的，因为它没有直接从read_sql_table()中利用 date_parse 并且依赖于 Dask 出于某种我不知道的原因生成的列time__1 ...

data = dd.read_sql_table('data', 'sqlite:///'+DB_PATH, index_col='time')
data = data.set_index(data['time__1'].map_partitions(pd.to_datetime, unit='s'))
data = data.drop('time__1', axis=1)

If you know a better solution...如果您知道更好的解决方案...

在 Dask 中使用 read_sql_table 将时间戳分配为索引

问题描述

1 个解决方案

解决方案1
0 2021-05-04 14:24:06

在 Dask 中使用 read_sql_table 将时间戳分配为索引

问题描述

1 个解决方案

解决方案1 0 2021-05-04 14:24:06

解决方案1
0 2021-05-04 14:24:06