[英]Pandas dataframe - update the values of certain rows based on the condition of a groupby object
使用以下數據:
idx_a = pd.date_range(start="2000-01-01 00:00:00", periods=5, freq="H")
idx_b = pd.date_range(start="2000-01-01 00:05:00", periods=5, freq="H")
idx_c = pd.date_range(start="2000-01-02 00:00:00", periods=5, freq="H")
idx_d = pd.date_range(start="2000-01-02 00:05:00", periods=5, freq="H")
df = pd.DataFrame({'article: ['a', 'b']*10 , 'view': range(1,21) }, index= idx_a.union(idx_b).union(idx_c).union(idx_d))
article view
2000-01-01 00:00:00 a 1
2000-01-01 00:05:00 b 2
2000-01-01 01:00:00 a 3
2000-01-01 01:05:00 b 4
2000-01-01 02:00:00 a 5
2000-01-01 02:05:00 b 6
2000-01-01 03:00:00 a 7
2000-01-01 03:05:00 b 8
2000-01-01 04:00:00 a 9
2000-01-01 04:05:00 b 10
2000-01-02 00:00:00 a 11
2000-01-02 00:05:00 b 12
2000-01-02 01:00:00 a 13
2000-01-02 01:05:00 b 14
2000-01-02 02:00:00 a 15
2000-01-02 02:05:00 b 16
2000-01-02 03:00:00 a 17
2000-01-02 03:05:00 b 18
2000-01-02 04:00:00 a 19
2000-01-02 04:05:00 b 20
我想每天只更新凌晨 2 點的view
值和凌晨 3 點的每篇文章的view
值。 因此,所需的結果應如下所示(“<==”表示行已更新):
article view
2000-01-01 00:00:00 a 1
2000-01-01 00:05:00 b 2
2000-01-01 01:00:00 a 3
2000-01-01 01:05:00 b 4
2000-01-01 02:00:00 a 7 <===
2000-01-01 02:05:00 b 8 <===
2000-01-01 03:00:00 a 7
2000-01-01 03:05:00 b 8
2000-01-01 04:00:00 a 9
2000-01-01 04:05:00 b 10
2000-01-02 00:00:00 a 11
2000-01-02 00:05:00 b 12
2000-01-02 01:00:00 a 13
2000-01-02 01:05:00 b 14
2000-01-02 02:00:00 a 17 <===
2000-01-02 02:05:00 b 18 <===
2000-01-02 03:00:00 a 17
2000-01-02 03:05:00 b 18
2000-01-02 04:00:00 a 19
2000-01-02 04:05:00 b 20
經過多次嘗試,我得到了這段代碼最接近的結果:
df.groupby([pd.Grouper(freq="D"), 'article']).view.transform(lambda s: s.where( ~(s.index.hour==2), s[s.index.hour==3]))
但是, np.nan
顯示在我希望更新的單元格中。 有趣的是,如果我將s[s.index.hour==3]
替換為 integer,則單元格(即 2am 值)將正確更新為 integer。 如何獲取每篇文章凌晨 3 點的價值並使用它來更新給定日期凌晨 2 點的價值?
如果僅在hour
hour
過濾行,然后使用DataFrame.update
:
df1 = df[df.index.hour==3].rename(lambda x: x.replace(hour=2))
print (df1)
article view
2000-01-01 02:00:00 a 7
2000-01-01 02:05:00 b 8
2000-01-02 02:00:00 a 17
2000-01-02 02:05:00 b 18
df.update(df1)
print (df)
article view
2000-01-01 00:00:00 a 1.0
2000-01-01 00:05:00 b 2.0
2000-01-01 01:00:00 a 3.0
2000-01-01 01:05:00 b 4.0
2000-01-01 02:00:00 a 7.0
2000-01-01 02:05:00 b 8.0
2000-01-01 03:00:00 a 7.0
2000-01-01 03:05:00 b 8.0
2000-01-01 04:00:00 a 9.0
2000-01-01 04:05:00 b 10.0
2000-01-02 00:00:00 a 11.0
2000-01-02 00:05:00 b 12.0
2000-01-02 01:00:00 a 13.0
2000-01-02 01:05:00 b 14.0
2000-01-02 02:00:00 a 17.0
2000-01-02 02:05:00 b 18.0
2000-01-02 03:00:00 a 17.0
2000-01-02 03:05:00 b 18.0
2000-01-02 04:00:00 a 19.0
2000-01-02 04:05:00 b 20.0
您的解決方案應該通過將值轉換為列表來更改,也可以使用反轉掩碼!=
:
df['view1'] = (df.groupby([pd.Grouper(freq="D"), 'article']).view
.transform(lambda s: s.where(s.index.hour!=2,s[s.index.hour==3].tolist())))
print (df)
article view view1
2000-01-01 00:00:00 a 1 1
2000-01-01 00:05:00 b 2 2
2000-01-01 01:00:00 a 3 3
2000-01-01 01:05:00 b 4 4
2000-01-01 02:00:00 a 5 7
2000-01-01 02:05:00 b 6 8
2000-01-01 03:00:00 a 7 7
2000-01-01 03:05:00 b 8 8
2000-01-01 04:00:00 a 9 9
2000-01-01 04:05:00 b 10 10
2000-01-02 00:00:00 a 11 11
2000-01-02 00:05:00 b 12 12
2000-01-02 01:00:00 a 13 13
2000-01-02 01:05:00 b 14 14
2000-01-02 02:00:00 a 15 17
2000-01-02 02:05:00 b 16 18
2000-01-02 03:00:00 a 17 17
2000-01-02 03:05:00 b 18 18
2000-01-02 04:00:00 a 19 19
2000-01-02 04:05:00 b 20 20
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.