在 Dask 中使用 read_sql_table 將時間戳分配為索引

Question

在 SQLite 中，我有一個帶有索引列time的表data ，它是time.time()記錄時生成的時間戳

我想將此表中的數據加載到 Dask DataFrame。 為此，我使用：

import dask.dataframe as dd
data = dd.read_sql_table('data', 'sqlite:///'+DB_PATH, index_col='time', parse_dates={"time": {"unit":"s"}})

當我想對數據data.resample('15S').mean()應用滾動平均值時，我得到：

TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'Float64Index'

因此，如果我通過data.index檢查索引，它會給我這個，表明它在正確的 dtype 中並且parse_dates確實有效，對嗎？ ：

Dask Index Structure:
npartitions=1
1.619876e+09    datetime64[ns]
1.620067e+09               ...
Name: time, dtype: datetime64[ns]
Dask Name: from-delayed, 3 tasks

最后，我嘗試在加載后定義索引日期時間：

data = dd.read_sql_table('data', 'sqlite:///'+DB_PATH, index_col='time')
data['time__1'] = pd.to_datetime(np.array(data['time__1']), unit='s') # By the way, I don't know why Dask creates a 'time__1' column...
data = data.set_index('time__1', sorted=True)

但是后來我收到了這個消息......

ValueError: Length of values does not match length of index

Answer 1

這是我想出的解決方案。 它似乎不是最有效的，因為它沒有直接從read_sql_table()中利用 date_parse 並且依賴於 Dask 出於某種我不知道的原因生成的列time__1 ...

data = dd.read_sql_table('data', 'sqlite:///'+DB_PATH, index_col='time')
data = data.set_index(data['time__1'].map_partitions(pd.to_datetime, unit='s'))
data = data.drop('time__1', axis=1)

如果您知道更好的解決方案...

在 Dask 中使用 read_sql_table 將時間戳分配為索引

問題描述

1 個解決方案

解決方案1
0 2021-05-04 14:24:06

在 Dask 中使用 read_sql_table 將時間戳分配為索引

問題描述

1 個解決方案

解決方案1 0 2021-05-04 14:24:06

解決方案1
0 2021-05-04 14:24:06