簡體   English   中英

通過在兩個現有列之間插入日期來創建每周時間序列

[英]Create weekly time series by interpolating dates between two existing columns

如何使用熊貓將“源數據框”轉換為“目標數據框”?

源數據框的datefrom和dateto是日期范圍,我希望將其轉換為每周的日期范圍,如結果“目標數據框”。

源數據框

DateFrom    DateTo      Catalog  Score
2017-05-01  2017-05-21  ABC      20
2017-05-22  2017-06-04  WXY      30

目標日期范圍

DateFrom    DateTo      Catalog  Score
2017-05-01  2017-05-07  ABC      20
2017-05-08  2017-05-14  ABC      20
2017-05-15  2017-05-21  ABC      20
2017-05-22  2017-05-28  WXY      30
2017-05-29  2017-06-04  WXY      30

使用meltDateFromDateTo對齊,然后使用groupby(Catalog)並通過向前填充在DateToresample
重建DateFrom使用TimeDelta

melted = pd.melt(df, id_vars=["Catalog", "Score"], var_name="x", value_name="DateTo")

df2 = (
    melted.set_index(pd.to_datetime(melted.DateTo))
     .drop(["x", "DateTo"],1)
     .groupby("Catalog", as_index=False)
     .resample("W")
     .ffill()
     .reset_index(level=1)
)

df2["DateFrom"] = df2.DateTo - pd.Timedelta("6 days")

輸出:

df2[df.columns]
                   Catalog  Score
Catalog date                     
ABC     2017-05-07     ABC     20
        2017-05-14     ABC     20
        2017-05-21     ABC     20
WXY     2017-05-28     WXY     30
        2017-06-04     WXY     30

數據:

df
     DateFrom      DateTo Catalog  Score
0  2017-05-01  2017-05-21     ABC     20
1  2017-05-22  2017-06-04     WXY     30

在此處擴展類似問題的答案擴展日期范圍為列的pandas數據框 ,您可以遍歷每一行並按如下所示擴展數據框

import pandas as pd
from datetime import timedelta


newdf = pd.concat(
    [
        pd.DataFrame(
            {
                'DataFrom':
                pd.date_range(row.DateFrom, row.DateTo, freq='W-MON'),
                'DateTo':
                pd.date_range(
                    row.DateFrom + timedelta(days=6),
                    row.DateTo + timedelta(days=6),
                    freq='W'),
                'Catalog':
                row.Catalog,
                'Score':
                row.Score
            },
            columns=['DataFrom', 'DateTo', 'Catalog', 'Score'])
        for i, row in df.iterrows()
    ],
    ignore_index=True)

打印以下輸出

newdf

    DataFrom    DateTo    Catalog   Score
0   2017-05-01  2017-05-07  ABC     20
1   2017-05-08  2017-05-14  ABC     20
2   2017-05-15  2017-05-21  ABC     20
3   2017-05-22  2017-05-28  WXY     30
4   2017-05-29  2017-06-04  WXY     30

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM