[英]Convert pandas df with data in a "list column" into a time series in long format. Use three columns: [list of data] + [timestamp] + [duration]
The aim is to convert a dataframe with a list column as the data column (and thus with just one timestamp and duration per row) into a time series in long format with a datetimeindex
for each single item.目的是将 dataframe 以列表列作为数据列(因此每行只有一个时间戳和持续时间)转换为长格式的时间序列,每个项目都有一个datetimeindex
时间索引。
In the result, there is no sequence/list per row for the data anymore, but just one value
column.结果,数据的每行不再有序列/列表,只有一个value
列。
df_test = pd.DataFrame({'timestamp': [1462352000000000000, 1462352100000000000, 1462352200000000000, 1462352300000000000],
'list': [[1,2,1,9], [2,2,3,0], [1,3,3,0], [1,1,3,9]],
'duration_sec': [3.0, 3.0, 3.0, 3.0]})
tdi = pd.DatetimeIndex(df_test.timestamp)
df_test.set_index(tdi, inplace=True)
df_test.drop(columns='timestamp', inplace=True)
df_test.index.name = 'datetimeindex'
Out:出去:
list duration_sec
datetimeindex
2016-05-04 08:53:20 [1, 2, 1, 9] 3.0
2016-05-04 08:55:00 [2, 2, 3, 0] 3.0
2016-05-04 08:56:40 [1, 3, 3, 0] 3.0
2016-05-04 08:58:20 [1, 1, 3, 9] 3.0
The aim is:目的是:
value
datetimeindex
2016-05-04 08:53:20 1
2016-05-04 08:53:21 2
2016-05-04 08:53:22 1
2016-05-04 08:53:23 9
2016-05-04 08:55:00 2
2016-05-04 08:55:01 2
2016-05-04 08:55:02 3
2016-05-04 08:55:03 0
2016-05-04 08:56:40 1
2016-05-04 08:56:41 3
2016-05-04 08:56:42 3
2016-05-04 08:56:43 0
2016-05-04 08:58:20 1
2016-05-04 08:58:21 1
2016-05-04 08:58:22 3
2016-05-04 08:58:23 9
Mind that this means not just to take 1 second for each item;请注意,这意味着不仅仅是为每个项目花费 1 秒; this was just taken to simplify the example.这只是为了简化示例。 Instead, it is about 4 items in a sequence that has a given duration of, for example, 3.0 seconds (which may also vary from row to row), and where the first item of each sequence always starts at "time 0", meaning that the seconds per item should be calculated like相反,它是一个序列中的大约 4 个项目,具有给定的持续时间,例如 3.0 秒(也可能因行而异),并且每个序列的第一个项目始终从“时间 0”开始,这意味着每个项目的秒数应该像这样计算
[3.0 sec / (4-1) items] = 1 sec. [3.0 秒/(4-1) 项] = 1 秒。
Context:语境:
The example shows conversion to Datetimeindex
since this makes it suitable for seasonal_decompose()
, see this the first search hit.该示例显示了到Datetimeindex
的转换,因为这使其适用于seasonal_decompose()
,请参阅第一个搜索结果。
There, the resulting df looks like this:在那里,生成的 df 看起来像这样:
df_test2 = pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/a10.csv', parse_dates=['date'], index_col='date')
Out:出去:
value
date
1991-07-01 3.526591
1991-08-01 3.180891
1991-09-01 3.252221
1991-10-01 3.611003
1991-11-01 3.565869
...
2008-02-01 21.654285
2008-03-01 18.264945
2008-04-01 23.107677
2008-05-01 22.912510
2008-06-01 19.431740
[204 rows x 1 columns]
And then it is easy to apply a seasonal_decompose()
via additive
decomposition model:然后很容易通过additive
分解应用seasonal_decompose()
model:
result_add = seasonal_decompose(df_test2['value'], model='additive', extrapolate_trend='freq')
# Plot
plt.rcParams.update({'figure.figsize': (5,5)})
result_add.plot().suptitle('Additive Decompose', fontsize=22)
plt.show()
Now the same is needed for the df_test
above.现在上面的df_test
也需要同样的东西。
Use DataFrame.explode
first and then add counter by GroupBy.cumcount
and to_timedelta
to df.index
:首先使用DataFrame.explode
然后通过GroupBy.cumcount
和to_timedelta
添加计数器到df.index
:
df_test = df_test.explode('nestedList')
df_test.index += pd.to_timedelta(df_test.groupby(level=0).cumcount(), unit='s')
print (df_test)
nestedList duration_sec
2016-05-04 08:53:20 1 3.0
2016-05-04 08:53:21 2 3.0
2016-05-04 08:53:22 1 3.0
2016-05-04 08:53:23 9 3.0
2016-05-04 08:55:00 2 3.0
2016-05-04 08:55:01 2 3.0
2016-05-04 08:55:02 3 3.0
2016-05-04 08:55:03 0 3.0
2016-05-04 08:56:40 1 3.0
2016-05-04 08:56:41 3 3.0
2016-05-04 08:56:42 3 3.0
2016-05-04 08:56:43 0 3.0
2016-05-04 08:58:20 1 3.0
2016-05-04 08:58:21 1 3.0
2016-05-04 08:58:22 3 3.0
2016-05-04 08:58:23 9 3.0
EDIT:编辑:
df_test = df_test.explode('nestedList')
sizes = df_test.groupby(level=0)['nestedList'].transform('size').sub(1)
duration = df_test['duration_sec'].div(sizes)
df_test.index += pd.to_timedelta(df_test.groupby(level=0).cumcount() * duration, unit='s')
EDIT2 by asker:提问者的 EDIT2:
With the resulting df this simple application of decompose() is now possible, which was the final aim:通过生成的 df,现在可以实现 decompose() 的这种简单应用,这是最终目标:
result_add = seasonal_decompose(x=df_test['nestedList'], model='additive', extrapolate_trend='freq', period=int(len(df_test)/2))
plt.rcParams.update({'figure.figsize': (5,5)})
result_add.plot().suptitle('Additive Decompose', fontsize=22)
plt.show()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.