[英]Merge dataframe object and timedelta64
I have a dataframe of dtype datetime64我有一个 dtype datetime64 的 dataframe
df:
time timestamp
18053.401736 2019-06-06 09:38:30+00:00
18053.418252 2019-06-06 10:02:17+00:00
18053.424514 2019-06-06 10:11:18+00:00
18053.454132 2019-06-06 10:53:57+00:00
Name: timestamp, dtype: datetime64[ns, UTC]
and a Series of dtype timedelta64和一系列 dtype timedelta64
ss:
ref_time
0 days 09:00:00
1 0 days 09:00:01
2 0 days 09:00:02
3 0 days 09:00:03
4 0 days 09:00:04
...
21596 0 days 14:59:56
21597 0 days 14:59:57
21598 0 days 14:59:58
21599 0 days 14:59:59
21600 0 days 15:00:00
Name: timeonly, Length: 21601, dtype: timedelta64[ns]
I want to merge the two so that the output df have values only where timestamp coincide with the one of the Series:我想将两者合并,以便 output df 仅在时间戳与系列之一重合的情况下具有值:
Desired output:
time timestamp ref_time
Nan Nan 09:00:00
... ... ...
Nan Nan 09:38:29
18053.401736 2019-06-06 09:38:30+00:00 09:38:30
Nan Nan 09:38:31
... ... ...
18053.418252 2019-06-06 10:02:17+00:00 10:02:17
Nan Nan 10:02:18
Nan Nan 10:02:19
... ... ...
18053.424514 2019-06-06 10:11:18+00:00 10:11:18
... ... ...
18053.454132 2019-06-06 10:53:57+00:00 10:53:57
However if I convert 'timestamp' to a time-only I get an object dtype and I can't merge it with ss.但是,如果我将“时间戳”转换为仅时间,我会得到一个 object dtype,我无法将它与 ss 合并。
dframe['timestamp'].dtype # --> datetime64[ns, UTC]
df['timeonly'] = df['timestamp'].dt.time
df['timeonly'].dtype # --> object
df_date.merge(timeax, how='outer', on=['timeonly'])
# ValueError: You are trying to merge on object and timedelta64[ns] columns. If you wish to proceed you should use pd.concat
but using concat as suggested doesn't give me the desired output.但是按照建议使用 concat 并没有给我想要的 output。 How can I merge/join the DataFrame and the Series?如何合并/加入 DataFrame 和系列? Pandas version 1.1.5 Pandas 1.1.5版
Convert the timestamp to timedelta by subtracting the date part and then merge:通过减去日期部分将时间戳转换为 timedelta,然后合并:
df1 = pd.DataFrame([pd.Timestamp('2019-06-06 09:38:30+00:00'),pd.Timestamp('2019-06-06 10:02:17+00:00')], columns=['timestamp'])
df2 = pd.DataFrame([pd.Timedelta('09:38:30')], columns=['ref_time'])
timestamp
0 2019-06-06 09:38:30+00:00
1 2019-06-06 10:02:17+00:00
timestamp datetime64[ns, UTC]
dtype: object
ref_time
0 09:38:30
ref_time timedelta64[ns]
dtype: object
df1['merge_key'] = df1['timestamp'].dt.tz_localize(None) - pd.to_datetime(df1['timestamp'].dt.date)
df_merged = df1.merge(df2, left_on = 'merge_key', right_on = 'ref_time')
Gives:给出:
timestamp merge_key ref_time
0 2019-06-06 09:38:30+00:00 09:38:30 09:38:30
The main challenge here is to get everything into compatible date types.这里的主要挑战是将所有内容都转换为兼容的日期类型。 Using your, slightly modified, examples as inputs使用您稍作修改的示例作为输入
from io import StringIO
df = pd.read_csv(StringIO(
"""
time,timestamp
18053.401736,2019-06-06 09:38:30+00:00
18053.418252,2019-06-06 10:02:17+00:00
18053.424514,2019-06-06 10:11:18+00:00
18053.454132,2019-06-06 10:53:57+00:00
"""))
df['timestamp'] = pd.to_datetime(df['timestamp'])
from datetime import timedelta
sdf = pd.read_csv(StringIO(
"""
ref_time
0 days 09:00:00
0 days 09:00:01
0 days 09:00:02
0 days 09:00:03
0 days 09:00:04
0 days 09:38:30
0 days 10:02:17
0 days 14:59:56
0 days 14:59:57
0 days 14:59:58
0 days 14:59:59
0 days 15:00:00
"""))
sdf['ref_time'] = pd.to_timedelta(sdf['ref_time'])
The dtypes here are as in your question which is important这里的 dtypes 在你的问题中很重要
First we figure out the base_date
as we need to convert timedeltas into datetimes etc. Note we set it to midnight of the relevant date via round('1d')
首先,我们计算出base_date
,因为我们需要将 timedeltas 转换为日期时间等。请注意,我们通过round('1d')
将其设置为相关日期的午夜
base_date = df['timestamp'].iloc[0].round('1d').to_pydatetime()
base_date
output output
datetime.datetime(2019, 6, 6, 0, 0, tzinfo=<UTC>)
Next we add timedeltas from sdf
to the base_date:接下来我们将时间增量从sdf
添加到 base_date:
sdf['ref_dt'] = sdf['ref_time'] + base_date
Now sdf['ref_dt']
and df['timestamp']
are in the same 'units' and of the same type, so we can merge现在sdf['ref_dt']
和df['timestamp']
在相同的“单位”和相同的类型,所以我们可以合并
sdf.merge(df, left_on = 'ref_dt', right_on = 'timestamp', how = 'left')
output output
ref_time ref_dt time timestamp
-- --------------- ------------------------- ------- -------------------------
0 0 days 09:00:00 2019-06-06 09:00:00+00:00 nan NaT
1 0 days 09:00:01 2019-06-06 09:00:01+00:00 nan NaT
2 0 days 09:00:02 2019-06-06 09:00:02+00:00 nan NaT
3 0 days 09:00:03 2019-06-06 09:00:03+00:00 nan NaT
4 0 days 09:00:04 2019-06-06 09:00:04+00:00 nan NaT
5 0 days 09:38:30 2019-06-06 09:38:30+00:00 18053.4 2019-06-06 09:38:30+00:00
6 0 days 10:02:17 2019-06-06 10:02:17+00:00 18053.4 2019-06-06 10:02:17+00:00
7 0 days 14:59:56 2019-06-06 14:59:56+00:00 nan NaT
8 0 days 14:59:57 2019-06-06 14:59:57+00:00 nan NaT
9 0 days 14:59:58 2019-06-06 14:59:58+00:00 nan NaT
10 0 days 14:59:59 2019-06-06 14:59:59+00:00 nan NaT
11 0 days 15:00:00 2019-06-06 15:00:00+00:00 nan NaT
and we see the merge happening where needed我们看到合并发生在需要的地方
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.