简体   繁体   English

将两个时间序列与 Python 中日期时间索引的不同元素结合起来

[英]Combine two time-series with different element of the datetime index in Python

I have two time-series below.我在下面有两个time-series df1 has an index in a DateTime format which includes only date without time. df1有一个DateTime格式的索引,它只包含日期而不包含时间。 df2 has a full datetime index, also in a DateTime format. df2具有完整的日期时间索引,也是DateTime格式。 In the full data, df1 is much shorter than df2 in terms of the number of rows.在全数据中, df1在行数方面比df2短很多。

As you can see, both datasets span from the 2nd to the 6th of April.如您所见,这两个数据集的时间跨度为 4 月 2 日至 6 日。 df1, however, skips some dates, while in df2, all days are available.但是,df1 会跳过某些日期,而在 df2 中,所有日期都可用。 Note: in this example, only odd dates are skipped, but it is not the case in the full data.注意:在此示例中,仅跳过奇数日期,但在完整数据中并非如此。

df1 df1

    value1
date            
2016-04-02  16
2016-04-04  76
2016-04-06  23

df2 df2

    value2
DateTime    
2016-04-02 07:45:00 257.96
2016-04-02 07:50:00 317.58
2016-04-02 07:55:00 333.39
2016-04-03 08:15:00 449.96
2016-04-03 08:20:00 466.42
2016-04-03 08:25:00 498.56
2016-04-04 08:10:00 454.73
2016-04-04 08:15:00 472.45
2016-04-04 08:20:00 489.85
2016-04-05 07:30:00 169.54
2016-04-05 07:35:00 276.13
2016-04-05 07:40:00 293.70
2016-04-06 07:10:00 108.05
2016-04-06 07:15:00 179.21
2016-04-06 07:20:00 201.80

I want to combine the two datasets by index.我想按索引组合两个数据集。 df1 should controls which dates to be kept. df1 应该控制要保留的日期。 The expected result is below.预期结果如下。

    value2  value1
DateTime    
2016-04-02 07:45:00 257.96  16
2016-04-02 07:50:00 317.58  16
2016-04-02 07:55:00 333.39  16
2016-04-04 08:10:00 454.73  76
2016-04-04 08:15:00 472.45  76
2016-04-04 08:20:00 489.85  76
2016-04-06 07:10:00 108.05  23
2016-04-06 07:15:00 179.21  23
2016-04-06 07:20:00 201.80  23

This is my attempt.这是我的尝试。

result= pd.concat([df1, df1], axis=1, sort=True).dropna(how='all')

But the result is different to what I expect.但是结果跟我预想的不一样。

Here is possible create new helper column filled by datetimes without times with DatetimeIndex.normalize :这里可以使用DatetimeIndex.normalize创建由日期时间填充的新辅助列,而没有DatetimeIndex.normalize

df2['date'] = df2.index.normalize()

Or if dates use DatetimeIndex.date :或者如果日期使用DatetimeIndex.date

df2['date'] = df2.index.date

And then use merge with default inner join:然后使用带有默认内部连接的merge

result= df1.merge(df2, left_index=True, right_on='date')
print (result)
                     value1  value2       date
DateTime                                      
2016-04-02 07:45:00      16  257.96 2016-04-02
2016-04-02 07:50:00      16  317.58 2016-04-02
2016-04-02 07:55:00      16  333.39 2016-04-02
2016-04-04 08:10:00      76  454.73 2016-04-04
2016-04-04 08:15:00      76  472.45 2016-04-04
2016-04-04 08:20:00      76  489.85 2016-04-04
2016-04-06 07:10:00      23  108.05 2016-04-06
2016-04-06 07:15:00      23  179.21 2016-04-06
2016-04-06 07:20:00      23  201.80 2016-04-06

Or use merge_asof , but it merging by previous match values, so working same like above only if always match datetimes without times from df2 with date s from df1 :或者使用merge_asof ,但它通过先前的匹配值合并,因此只有在始终匹配日期时间而不是来自df2date与来自df1 date s 时,才能像上面一样工作:

result= pd.merge_asof(df2, df1, left_index=True, right_index=True)
print (result)
                     value2  value1
DateTime                           
2016-04-02 07:45:00  257.96      16
2016-04-02 07:50:00  317.58      16
2016-04-02 07:55:00  333.39      16
2016-04-03 08:15:00  449.96      16
2016-04-03 08:20:00  466.42      16
2016-04-03 08:25:00  498.56      16
2016-04-04 08:10:00  454.73      76
2016-04-04 08:15:00  472.45      76
2016-04-04 08:20:00  489.85      76
2016-04-05 07:30:00  169.54      76
2016-04-05 07:35:00  276.13      76
2016-04-05 07:40:00  293.70      76
2016-04-06 07:10:00  108.05      23
2016-04-06 07:15:00  179.21      23
2016-04-06 07:20:00  201.80      23

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM