简体   繁体   English

如何将多索引转换为日期时间索引? (没有重新索引())

[英]how to convert multiindex to datetimeindex? (without reindex())

I need to use "modin" dataframes, which dont not work with multindexes (at one point i do df.reindex(idx), where idx is a multilevel index), so: how can I convert a multi index to a single index?我需要使用“modin”数据帧,它不适用于多索引(有一次我使用 df.reindex(idx),其中 idx 是多级索引),所以:如何将多索引转换为单索引? (merging both levels together) (将两个级别合并在一起)

minimal sample:最小样本:


import pandas as pd
idx = pd.DatetimeIndex(['2019-07-17 22:43:00',
            '2019-07-17 22:44:00',
            '2019-07-17 22:45:00',
            '2019-07-17 22:46:00',
            '2019-07-17 22:47:00',
            '2019-07-17 22:48:00',
            '2019-07-17 22:49:00',
            '2019-07-17 22:50:00',
            '2019-07-17 22:51:00',
            '2019-07-17 22:52:00', 
            '2019-07-23 22:33:00',
            '2019-07-23 22:34:00',
            '2019-07-23 22:35:00',
            '2019-07-23 22:36:00',
            '2019-07-23 22:37:00',
            '2019-07-23 22:38:00',
            '2019-07-23 22:39:00',
            '2019-07-23 22:40:00',
            '2019-07-23 22:41:00',
            '2019-07-23 22:42:00'] ) 

idx = pd.MultiIndex.from_tuples(zip( idx.date, idx.time))

dates_new   =  idx.get_level_values(0).unique()  
times_new =  idx.get_level_values(1).unique()

idx = pd.MultiIndex.from_product([dates_new,times_new]) 
idx = pd.DatetimeIndex(idx)
print(idx)

the following works, but is there any way to speed it up (on large datasets) ?以下工作,但有什么方法可以加快速度(在大型数据集上)?

[datetime.datetime.combine(date,time) for date,time in idx.values]

Your problem is starting from a DateTimeIndex , you want to find all combination of Date and Time and convert that to a new DateTimeIndex .您的问题是从DateTimeIndex开始,您想找到DateTime所有组合并将其转换为新的DateTimeIndex

I would not use .time access since that gives a datetime object, which doesn't play along nicely with Pandas.我不会使用.time访问,因为它提供了一个datetime对象,它不能很好地与 Pandas 配合使用。 Instead, let's try:相反,让我们尝试:

dates_new = set(idx.normalize())
times_new = set(idx - idx.normalize())

from itertools import product
new_idx = pd.DatetimeIndex([x+y for x,y in product(dates_new, times_new)])

Output:输出:

DatetimeIndex(['2019-07-23 22:36:00', '2019-07-23 22:41:00',
               '2019-07-23 22:50:00', '2019-07-23 22:40:00',
               '2019-07-23 22:45:00', '2019-07-23 22:51:00',
               '2019-07-23 22:33:00', '2019-07-23 22:42:00',
               '2019-07-23 22:37:00', '2019-07-23 22:46:00',
               '2019-07-23 22:43:00', '2019-07-23 22:52:00',
               '2019-07-23 22:34:00', '2019-07-23 22:47:00',
               '2019-07-23 22:38:00', '2019-07-23 22:35:00',
               '2019-07-23 22:44:00', '2019-07-23 22:49:00',
               '2019-07-23 22:39:00', '2019-07-23 22:48:00',
               '2019-07-17 22:36:00', '2019-07-17 22:41:00',
               '2019-07-17 22:50:00', '2019-07-17 22:40:00',
               '2019-07-17 22:45:00', '2019-07-17 22:51:00',
               '2019-07-17 22:33:00', '2019-07-17 22:42:00',
               '2019-07-17 22:37:00', '2019-07-17 22:46:00',
               '2019-07-17 22:43:00', '2019-07-17 22:52:00',
               '2019-07-17 22:34:00', '2019-07-17 22:47:00',
               '2019-07-17 22:38:00', '2019-07-17 22:35:00',
               '2019-07-17 22:44:00', '2019-07-17 22:49:00',
               '2019-07-17 22:39:00', '2019-07-17 22:48:00'],
              dtype='datetime64[ns]', freq=None)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM