[英]pandas: add week dates with dataframe
我有一個 df like,它有這樣的行:
p_id m_id x_id g_id u_id
0 2 NaN 1408 7 121
1 3 1259 117 23 315
2 3 1259 221 9 718
3 3 1259 397 76 367
和兩個日期時間對象:
開始日期:
datetime.datetime(2021, 5, 25, 0, 0)
結束日期:
datetime.datetime(2021, 5, 29, 0, 0)
基本上,我如何獲得 df (將每行從 start_date 到 end_date 的周日期添加):
p_id m_id x_id g_id u_id s_date
0 2 NaN 1408 7 121 2021-05-25
1 2 NaN 1408 7 121 2021-05-26
2 2 NaN 1408 7 121 2021-05-27
3 2 NaN 1408 7 121 2021-05-28
4 2 NaN 1408 7 121 2021-05-29
5 3 1259 117 23 315 2021-05-25
6 3 1259 117 23 315 2021-05-26
7 3 1259 117 23 315 2021-05-27
8 3 1259 117 23 315 2021-05-28
9 3 1259 117 23 315 2021-05-29
.
.
15 3 1259 397 76 367 2021-05-25
16 3 1259 397 76 367 2021-05-26
17 3 1259 397 76 367 2021-05-27
18 3 1259 397 76 367 2021-05-28
19 3 1259 397 76 367 2021-05-29
date_range
並交叉merge
1.2x
中,要執行交叉合並,我們現在可以將可選參數how='cross'
傳遞給合並 functiondates = pd.date_range(start_date, end_date)
df.merge(dates.to_series(name='s_date'), how='cross')
1.2x
,我們必須創建一個臨時合並密鑰才能執行cross
合並dates = pd.date_range(start_date, end_date)
df.assign(k=1).merge(dates.to_frame(name='s_date').assign(k=1), on='k').drop('k', 1)
p_id m_id x_id g_id u_id s_date
0 2 NaN 1408 7 121 2021-05-25
1 2 NaN 1408 7 121 2021-05-26
2 2 NaN 1408 7 121 2021-05-27
3 2 NaN 1408 7 121 2021-05-28
4 2 NaN 1408 7 121 2021-05-29
5 3 1259.0 117 23 315 2021-05-25
6 3 1259.0 117 23 315 2021-05-26
7 3 1259.0 117 23 315 2021-05-27
8 3 1259.0 117 23 315 2021-05-28
9 3 1259.0 117 23 315 2021-05-29
10 3 1259.0 221 9 718 2021-05-25
11 3 1259.0 221 9 718 2021-05-26
12 3 1259.0 221 9 718 2021-05-27
13 3 1259.0 221 9 718 2021-05-28
14 3 1259.0 221 9 718 2021-05-29
15 3 1259.0 397 76 367 2021-05-25
16 3 1259.0 397 76 367 2021-05-26
17 3 1259.0 397 76 367 2021-05-27
18 3 1259.0 397 76 367 2021-05-28
19 3 1259.0 397 76 367 2021-05-29
我要做的方法是首先創建兩個日期之間所有日期的列表,並將其作為新列添加到 dataframe 然后使用explode
分解成行:
這是一個例子:
df['s_date'] = [pd.date_range(datetime(2021, 5, 25, 0, 0),datetime(2021, 5, 29, 0, 0),freq='d')] * len(df)
df = df.explode('s_date')
Output:
id start score date
0 id1 NaN 3 2021-05-25
0 id1 NaN 3 2021-05-26
0 id1 NaN 3 2021-05-27
0 id1 NaN 3 2021-05-28
0 id1 NaN 3 2021-05-29
1 id2 12.0 1 2021-05-25
1 id2 12.0 1 2021-05-26
1 id2 12.0 1 2021-05-27
1 id2 12.0 1 2021-05-28
1 id2 12.0 1 2021-05-29
2 id3 11.0 8 2021-05-25
2 id3 11.0 8 2021-05-26
2 id3 11.0 8 2021-05-27
2 id3 11.0 8 2021-05-28
2 id3 11.0 8 2021-05-29
...
...
我的解決方案中的步驟:
DataFrame
DataFrame
pd.merge
外連接)import pandas as pd
from datetime import datetime, timedelta
# example to your df
a = [{'a': 1, 'b': 2}, {'a': 3, 'b': 4}]
a_df = pd.DataFrame(a)
start_date = datetime.strptime('2021-05-01', '%Y-%m-%d')
end_date = datetime.strptime('2021-06-01', '%Y-%m-%d')
num_of_days = (end_date - start_date).days
date_df = pd.DataFrame([start_date + timedelta(days=x) for x in range(num_of_days)], columns=['date'])
a_df = pd.DataFrame(a)
a_df['key'] = 0
date_df['key'] = 0
a_df = a_df.merge(date_df, on='key', how='outer')
a_df = a_df.drop('key', 1)
a_df
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.