[英]Date_Range parameters
我想將 date_range 應用於作為日期時間的 dataframe 的索引,我希望根據持續時間值將小時、天、月添加到所述索引中。
例如:原裝 Dataframe
Date_Out Hour_Duration
2020-04-10 06:19:45 3
2020-04-19 20:05:50 6
2020-04-30 22:50:00 4
例如:Dataframe 所需結果
Date_Out Hour_Duration
2020-04-10 06:19:45 3
2020-04-19 07:19:45 3
2020-04-19 08:19:45 3
2020-04-19 20:05:50 6
2020-04-19 21:05:50 6
2020-04-19 22:05:50 6
2020-04-19 23:05:50 6
2020-04-20 00:05:50 6
2020-04-20 01:05:50 6
2020-04-30 22:50:00 4
2020-04-30 23:50:00 4
2020-05-01 00:50:00 4
2020-05-01 01:50:00 4
您推薦什么解決方案? 可以在 date_range 的“周期”參數中應用 function 嗎?
更新:
原裝Dataframe(名稱Dataframe:游記)
Date Actual Departure Date Arrival Date DurationHour DHour
0 2020-04-28 2020-04-28 12:26:39 2020-04-28 16:24:00 0 days 03:57:21 3
1 2020-04-20 2020-04-20 07:53:22 2020-04-21 05:30:00 0 days 21:36:38 6
2 2020-05-28 2020-05-28 15:54:22 2020-05-29 08:17:00 0 days 16:22:38 2
3 2020-05-29 2020-05-29 22:57:05 2020-05-30 01:21:00 0 days 02:23:55 5
4 2020-05-25 2020-05-25 07:22:41 2020-05-30 13:47:00 5 days 06:24:19 1
travels.dtypes
Date datetime64[ns]
Actual Departure Date datetime64[ns]
Arrival Date datetime64[ns]
DurationHour timedelta64[ns]
DHour int64
預期結果
結果反映在Actual Departure Date
列中,其中 Actual Departure Actual Departure Date
列中的小時單位總和DHour
列的值(重復並增加一個小時)
Date Actual Departure Date Arrival Date DurationHour DHour
0 2020-04-28 2020-04-28 12:26:39 2020-04-28 16:24:00 0 days 03:57:21 3
0 2020-04-28 2020-04-28 13:26:39 2020-04-28 16:24:00 0 days 03:57:21 3
0 2020-04-28 2020-04-28 14:26:39 2020-04-28 16:24:00 0 days 03:57:21 3
0 2020-04-28 2020-04-28 15:26:39 2020-04-28 16:24:00 0 days 03:57:21 3
1 2020-04-20 2020-04-20 07:53:22 2020-04-21 05:30:00 0 days 21:36:38 6
1 2020-04-20 2020-04-20 08:53:22 2020-04-21 05:30:00 0 days 21:36:38 6
1 2020-04-20 2020-04-20 09:53:22 2020-04-21 05:30:00 0 days 21:36:38 6
1 2020-04-20 2020-04-20 10:53:22 2020-04-21 05:30:00 0 days 21:36:38 6
1 2020-04-20 2020-04-20 11:53:22 2020-04-21 05:30:00 0 days 21:36:38 6
1 2020-04-20 2020-04-20 12:53:22 2020-04-21 05:30:00 0 days 21:36:38 6
1 2020-04-20 2020-04-20 13:53:22 2020-04-21 05:30:00 0 days 21:36:38 6
2 2020-05-28 2020-05-28 15:54:22 2020-05-29 08:17:00 0 days 16:22:38 2
2 2020-05-28 2020-05-28 16:54:22 2020-05-29 08:17:00 0 days 16:22:38 2
2 2020-05-28 2020-05-28 16:54:22 2020-05-29 08:17:00 0 days 16:22:38 2
3 2020-05-29 2020-05-29 23:57:05 2020-05-30 01:21:00 0 days 02:23:55 5
3 2020-05-29 2020-05-30 00:57:05 2020-05-30 01:21:00 0 days 02:23:55 5
3 2020-05-29 2020-05-30 01:57:05 2020-05-30 01:21:00 0 days 02:23:55 5
3 2020-05-29 2020-05-30 02:57:05 2020-05-30 01:21:00 0 days 02:23:55 5
3 2020-05-29 2020-05-30 03:57:05 2020-05-30 01:21:00 0 days 02:23:55 5
3 2020-05-29 2020-05-30 04:57:05 2020-05-30 01:21:00 0 days 02:23:55 5
4 2020-05-25 2020-05-25 07:22:41 2020-05-30 13:47:00 5 days 06:24:19 1
4 2020-05-25 2020-05-25 08:22:41 2020-05-30 13:47:00 5 days 06:24:19 1
我正在嘗試以下方法: travels.loc[np.repeat(travels.index.values, abs(travels['DHour']))]
並且它重復正確,但我沒有在日期和時間達到所需的總和Actual Departure Date
列
您可以使用列表理解和 pd.concat 來做到這一點:
df = df.set_index('Date_Out')
pd.concat(
[
df.reindex(
pd.date_range(idx, periods=row["Hour_Duration"], freq="H"),
fill_value=row["Hour_Duration"],
)
for idx, row in df.iterrows()
]
)
Output:
Hour_Duration
2020-04-10 06:19:45 3
2020-04-10 07:19:45 3
2020-04-10 08:19:45 3
2020-04-19 20:05:50 6
2020-04-19 21:05:50 6
2020-04-19 22:05:50 6
2020-04-19 23:05:50 6
2020-04-20 00:05:50 6
2020-04-20 01:05:50 6
2020-04-30 22:50:00 4
2020-04-30 23:50:00 4
2020-05-01 00:50:00 4
2020-05-01 01:50:00 4
import pandas as pd
import numpy as np
from io import StringIO
input_text = StringIO(""" Date Actual Departure Date Arrival Date DurationHour DHour
0 2020-04-28 2020-04-28 12:26:39 2020-04-28 16:24:00 0 days 03:57:21 3
1 2020-04-20 2020-04-20 07:53:22 2020-04-21 05:30:00 0 days 21:36:38 6
2 2020-05-28 2020-05-28 15:54:22 2020-05-29 08:17:00 0 days 16:22:38 2
3 2020-05-29 2020-05-29 22:57:05 2020-05-30 01:21:00 0 days 02:23:55 5
4 2020-05-25 2020-05-25 07:22:41 2020-05-30 13:47:00 5 days 06:24:19 1""")
df = pd.read_csv(input_text, sep= '\s\s+', engine='python')
df['Date'] = pd.to_datetime(df['Date'])
df = df.set_index('Date')
df_out = pd.concat(
[
df.reindex(
pd.date_range(idx, periods=row["DHour"], freq="H"),
)
for idx, row in df.iterrows()
]
).ffill()
Output:
Actual Departure Date Arrival Date DurationHour DHour
2020-04-28 00:00:00 2020-04-28 12:26:39 2020-04-28 16:24:00 0 days 03:57:21 3.0
2020-04-28 01:00:00 2020-04-28 12:26:39 2020-04-28 16:24:00 0 days 03:57:21 3.0
2020-04-28 02:00:00 2020-04-28 12:26:39 2020-04-28 16:24:00 0 days 03:57:21 3.0
2020-04-20 00:00:00 2020-04-20 07:53:22 2020-04-21 05:30:00 0 days 21:36:38 6.0
2020-04-20 01:00:00 2020-04-20 07:53:22 2020-04-21 05:30:00 0 days 21:36:38 6.0
2020-04-20 02:00:00 2020-04-20 07:53:22 2020-04-21 05:30:00 0 days 21:36:38 6.0
2020-04-20 03:00:00 2020-04-20 07:53:22 2020-04-21 05:30:00 0 days 21:36:38 6.0
2020-04-20 04:00:00 2020-04-20 07:53:22 2020-04-21 05:30:00 0 days 21:36:38 6.0
2020-04-20 05:00:00 2020-04-20 07:53:22 2020-04-21 05:30:00 0 days 21:36:38 6.0
2020-05-28 00:00:00 2020-05-28 15:54:22 2020-05-29 08:17:00 0 days 16:22:38 2.0
2020-05-28 01:00:00 2020-05-28 15:54:22 2020-05-29 08:17:00 0 days 16:22:38 2.0
2020-05-29 00:00:00 2020-05-29 22:57:05 2020-05-30 01:21:00 0 days 02:23:55 5.0
2020-05-29 01:00:00 2020-05-29 22:57:05 2020-05-30 01:21:00 0 days 02:23:55 5.0
2020-05-29 02:00:00 2020-05-29 22:57:05 2020-05-30 01:21:00 0 days 02:23:55 5.0
2020-05-29 03:00:00 2020-05-29 22:57:05 2020-05-30 01:21:00 0 days 02:23:55 5.0
2020-05-29 04:00:00 2020-05-29 22:57:05 2020-05-30 01:21:00 0 days 02:23:55 5.0
2020-05-25 00:00:00 2020-05-25 07:22:41 2020-05-30 13:47:00 5 days 06:24:19 1.0
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.