[英]Pandas, how to find rows that meet certain conditions and save the previous row in a new dataframe
[英]Pandas Dataframe: Find the conditional mean of all observations that meet certain conditions that are DIFFERENT in each row
假設我有一個像這樣的數據框:
date M1_start M1_end SimPrices_t0_exp
0 2017-12-31 2018-01-01 2018-01-31 16.151667
1 2018-01-01 2018-02-01 2018-02-28 45.138445
2 2018-01-02 2018-02-01 2018-02-28 56.442648
3 2018-01-03 2018-02-01 2018-02-28 59.769931
4 2018-01-04 2018-02-01 2018-02-28 50.171695
我想獲得SimPrices_t0_exp觀測值的平均值,對於每個觀測值,其“日期”的值都在M1_start和M1_end之間
我已經試過了
mask = ((df['date'] >= df['M1_start']) & (df['date'] <= df['M1_end']))
df['mymean'] = df['SimPrices_t0_exp'][mask].mean()
我相信每次觀察都會返回NaN的原因,我相信是因為對每一行都應用了掩碼,因此需要逐一檢查掩碼條件以獲取其自己的日期(永遠不會返回true)。
有人可以幫我嗎? 我已經為這個問題苦苦掙扎了兩天
示例:對於第一個觀察,在此特定情況下,結果列在其第一個觀察中將具有平均45.13,56.44,59.76,50.17
如果對某人有幫助,則偽代碼將如下所示:
for obs in observations:
start = obs.start
end = obs.end
sum = 0
obs_count = 0
for obs2 in observations:
if obs2.date >= start and obs2.date <= end:
sum += obs.SimPrices_t0_exp
obs_count += 1
obs.mean = sum/obs_count
謝謝!!
在這里,一種使用笛卡爾合並(對於大型數據集不是一個好的選擇),過濾和groupby
方法來做到這一點:
df = df.assign(key=1)
df_m = df.merge(df, on='key')
df_m.query('M1_start_x <= date_y <= M1_end_x').groupby(['M1_start_x','M1_end_x'])['SimPrices_t0_exp_y'].mean()
輸出:
M1_start_x M1_end_x
2018-01-01 2018-01-31 52.88068
Name: SimPrices_t0_exp_y, dtype: float64
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.