![](/img/trans.png)
[英]Pandas Upsample Resample multi-index monthly data to multi-index weekly data?
[英]pandas: resample a multi-index dataframe
我有一個帶有多索引的數據框:“主題”和“日期時間”。 每行對應一個主題和一個日期時間,數據幀的列對應各種測量值。
每個科目的天數范圍不同,並且給定科目可能缺少某些天數(參見示例)。 此外,對於給定的一天,一個主題可以有一個或多個值。
我想重新采樣數據框,以便:
例如,以下數據框示例:
a b
subject datetime
patient1 2018-01-01 00:00:00 2.0 high
2018-01-01 01:00:00 NaN medium
2018-01-01 02:00:00 6.0 NaN
2018-01-01 03:00:00 NaN NaN
2018-01-02 00:00:00 4.3 low
patient2 2018-01-01 00:00:00 NaN medium
2018-01-01 02:00:00 NaN NaN
2018-01-01 03:00:00 5.0 NaN
2018-01-03 00:00:00 9.0 NaN
2018-01-04 02:00:00 NaN NaN
應該返回:
a b
subject datetime
patient1 2018-01-01 00:00:00 6.0 medium
2018-01-02 00:00:00 4.3 low
patient2 2018-01-01 00:00:00 5.0 medium
2018-01-03 00:00:00 9.0 NaN
我花了太多時間嘗試使用帶有 'pad' 選項的 resample 來獲得它,但我總是得到錯誤或不是我想要的結果。 有人可以幫忙嗎?
注意:這是創建示例數據框的代碼:
import pandas as pd
import numpy as np
index = pd.MultiIndex.from_product([['patient1', 'patient2'], pd.date_range('20180101', periods=4,
freq='h')])
df = pd.DataFrame({'a': [2, np.nan, 6, np.nan, np.nan, np.nan, np.nan, 5], 'b': ['high', 'medium', np.nan, np.nan, 'medium', 'low', np.nan, np.nan]},
index=index)
df.index.names = ['subject', 'datetime']
df = df.drop(df.index[5])
df.at[('patient2', '2018-01-03 00:00:00'), 'a'] = 9
df.at[('patient2', '2018-01-04 02:00:00'), 'a'] = None
df.at[('patient1', '2018-01-02 00:00:00'), 'a'] = 4.3
df.at[('patient1', '2018-01-02 00:00:00'), 'b'] = 'low'
df = df.sort_index(level=['subject', 'datetime'])
讓我們floor
的datetime
每日頻率,然后groupby
對數據幀subject
+地板的時間戳和agg
使用last
,終於drop
具備全部行NaN's
:
i = pd.to_datetime(df.index.get_level_values(1)).floor('d')
df1 = df.groupby(['subject', i]).agg('last').dropna(how='all')
a b
subject datetime
patient1 2018-01-01 6.0 medium
2018-01-02 4.3 low
patient2 2018-01-01 5.0 medium
2018-01-03 9.0 NaN
# drop a et b we don't need them when they ='re both na
df = df.reset_index().dropna(subset=["a", "b"], how="all")
#add a day columns we need it to keep last value
df["dt_day"] = df["datetime"].dt.date
#d1 result dataframe which we add a et b
d1 = df.copy().drop_duplicates(subset=["subject", "dt_day"]).loc[:, ["subject", "datetime"]].reset_index(drop=True)
#add a et b to ou dataframe result
for col in ["a", "b"]:
d1.loc[:,col] = (df.copy().
dropna(subset=[col]).drop_duplicates(subset=["subject", "dt_day"], keep="last")[col]
.reset_index(drop=True))
Wall time: 24 ms
@Shubham Sharma code => Wall time: 2.94 ms
subject datetime a b
0 patient1 2018-01-01 6.0 medium
1 patient1 2018-01-02 4.3 low
2 patient2 2018-01-01 5.0 medium
3 patient2 2018-01-03 9.0 NaN
謝謝你的問題:)
這應該可以完成這項工作:
def day_agg(series_):
try:
return series_.dropna().iloc[-1]
except IndexError:
return float("nan")
df = df.reset_index().sort_values("datetime")
df.groupby([df["subject"],df.datetime.map(lambda x:datetime(year=x.year,month=x.month,day=x.day))])\
.agg({"a":day_agg, "b":day_agg})\
.dropna(how="all")
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.