簡體   English   中英

合並 dataframe object 和 timedelta64

[英]Merge dataframe object and timedelta64

我有一個 dtype datetime64 的 dataframe

df:
time           timestamp
18053.401736   2019-06-06 09:38:30+00:00
18053.418252   2019-06-06 10:02:17+00:00
18053.424514   2019-06-06 10:11:18+00:00
18053.454132   2019-06-06 10:53:57+00:00
Name: timestamp, dtype: datetime64[ns, UTC]

和一系列 dtype timedelta64

ss:
         ref_time
       0 days 09:00:00
1       0 days 09:00:01
2       0 days 09:00:02
3       0 days 09:00:03
4       0 days 09:00:04
              ...      
21596   0 days 14:59:56
21597   0 days 14:59:57
21598   0 days 14:59:58
21599   0 days 14:59:59
21600   0 days 15:00:00
Name: timeonly, Length: 21601, dtype: timedelta64[ns]

我想將兩者合並,以便 output df 僅在時間戳與系列之一重合的情況下具有值:

Desired output:
time           timestamp                     ref_time
Nan            Nan                           09:00:00
...            ...                           ...
Nan            Nan                           09:38:29
18053.401736   2019-06-06 09:38:30+00:00     09:38:30
Nan            Nan                           09:38:31
...            ...                           ...
18053.418252   2019-06-06 10:02:17+00:00     10:02:17
Nan            Nan                           10:02:18
Nan            Nan                           10:02:19
...            ...                           ...
18053.424514   2019-06-06 10:11:18+00:00     10:11:18
...            ...                           ...
18053.454132   2019-06-06 10:53:57+00:00     10:53:57

但是,如果我將“時間戳”轉換為僅時間,我會得到一個 object dtype,我無法將它與 ss 合並。

dframe['timestamp'].dtype        # --> datetime64[ns, UTC]
df['timeonly'] = df['timestamp'].dt.time 
df['timeonly'].dtype             # --> object
df_date.merge(timeax, how='outer', on=['timeonly'])
# ValueError: You are trying to merge on object and timedelta64[ns] columns. If you wish to proceed you should use pd.concat

但是按照建議使用 concat 並沒有給我想要的 output。 如何合並/加入 DataFrame 和系列? Pandas 1.1.5版

通過減去日期部分將時間戳轉換為 timedelta,然后合並:

df1 = pd.DataFrame([pd.Timestamp('2019-06-06 09:38:30+00:00'),pd.Timestamp('2019-06-06 10:02:17+00:00')], columns=['timestamp'])
df2 = pd.DataFrame([pd.Timedelta('09:38:30')], columns=['ref_time'])
    timestamp                  
0   2019-06-06 09:38:30+00:00
1   2019-06-06 10:02:17+00:00

timestamp    datetime64[ns, UTC]
dtype: object

    ref_time
0   09:38:30

ref_time    timedelta64[ns]
dtype: object
df1['merge_key'] = df1['timestamp'].dt.tz_localize(None) - pd.to_datetime(df1['timestamp'].dt.date)
df_merged = df1.merge(df2, left_on = 'merge_key', right_on = 'ref_time')

給出:

    timestamp                   merge_key   ref_time
0   2019-06-06 09:38:30+00:00   09:38:30    09:38:30

這里的主要挑戰是將所有內容都轉換為兼容的日期類型。 使用您稍作修改的示例作為輸入

from io import StringIO
df = pd.read_csv(StringIO(
"""
time,timestamp
18053.401736,2019-06-06 09:38:30+00:00
18053.418252,2019-06-06 10:02:17+00:00
18053.424514,2019-06-06 10:11:18+00:00
18053.454132,2019-06-06 10:53:57+00:00
"""))
df['timestamp'] = pd.to_datetime(df['timestamp'])

from datetime import timedelta
sdf = pd.read_csv(StringIO(
"""
ref_time
0 days 09:00:00
0 days 09:00:01
0 days 09:00:02
0 days 09:00:03
0 days 09:00:04
0 days 09:38:30
0 days 10:02:17
0 days 14:59:56
0 days 14:59:57
0 days 14:59:58
0 days 14:59:59
0 days 15:00:00
"""))
sdf['ref_time'] = pd.to_timedelta(sdf['ref_time'])

這里的 dtypes 在你的問題中很重要

首先,我們計算出base_date ,因為我們需要將 timedeltas 轉換為日期時間等。請注意,我們通過round('1d')將其設置為相關日期的午夜

base_date = df['timestamp'].iloc[0].round('1d').to_pydatetime()
base_date

output

datetime.datetime(2019, 6, 6, 0, 0, tzinfo=<UTC>)

接下來我們將時間增量從sdf添加到 base_date:

sdf['ref_dt'] = sdf['ref_time'] + base_date

現在sdf['ref_dt']df['timestamp']在相同的“單位”和相同的類型,所以我們可以合並

sdf.merge(df, left_on = 'ref_dt', right_on = 'timestamp', how = 'left')

output

    ref_time         ref_dt                        time  timestamp
--  ---------------  -------------------------  -------  -------------------------
 0  0 days 09:00:00  2019-06-06 09:00:00+00:00    nan    NaT
 1  0 days 09:00:01  2019-06-06 09:00:01+00:00    nan    NaT
 2  0 days 09:00:02  2019-06-06 09:00:02+00:00    nan    NaT
 3  0 days 09:00:03  2019-06-06 09:00:03+00:00    nan    NaT
 4  0 days 09:00:04  2019-06-06 09:00:04+00:00    nan    NaT
 5  0 days 09:38:30  2019-06-06 09:38:30+00:00  18053.4  2019-06-06 09:38:30+00:00
 6  0 days 10:02:17  2019-06-06 10:02:17+00:00  18053.4  2019-06-06 10:02:17+00:00
 7  0 days 14:59:56  2019-06-06 14:59:56+00:00    nan    NaT
 8  0 days 14:59:57  2019-06-06 14:59:57+00:00    nan    NaT
 9  0 days 14:59:58  2019-06-06 14:59:58+00:00    nan    NaT
10  0 days 14:59:59  2019-06-06 14:59:59+00:00    nan    NaT
11  0 days 15:00:00  2019-06-06 15:00:00+00:00    nan    NaT

我們看到合並發生在需要的地方

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM